Foundations 


of Analysis 


Lure ee Aled. 
TEXTS - 18 


UNDERGRADUATE 


Foundations 


of Analysis 


Joseph L. Taylor 


‘American Mathematical Society 
TI) eovidence, Rhode Island 
= 


EDITORIAL COMMITTEE 


Paul J. Sally, Jr. (Chair) Joseph Silverman 
Francis Su Susan Tolman 


2018 Mathematics Subject Classification. Primary 26-01, 26Axx, 26Bxx, 26Dxx, 03Exx. 


For additional information and updates on this book, visit 
www.ams.org/bookpages/amstext-18 


Library of Congress Cataloging-in-Publication Data 
Taylor, Joseph L., 1941- 
Foundations of analysis / Joseph L. Taylor. 
pages cm. — (Pure and applied undergraduate texts ; volume 18) 
Includes bibliographical references and index. 
ISBN 978-0-8218-8984-8 (alk. paper) 
1. Functional analysis. 2. Functions of real variables. 1. Title 


@A320.T39 2012 
515! 7a 


Copying and reprinting. Individual readers of this publication, and nonprofit libraries 
acting for them, are permitted to make fair use of the material, such as to copy a chapter for use 
in teaching or research. Permission is granted to quote brief passages from this publication in 
reviews, provided the customary acknowledgment of the source is given. 

Republication, systematic copying, or multiple reproduction of any material in this publication 
is permitted only under license from the American Mathematical Society. Requests for such 
permission should be addressed to the Acquisitions Department, American Mathematical Society, 
201 Charles Street, Providence, Rhode Island 02904-2284 USA. Requests can also be made by 
e-mail to reprint-permission@ams. org. 


© 2012 by the American Mathematical Society. All rights reserved. 
The American Mathematical Society retains all rights 
except those granted to the United States Government. 
Printed in the United States of America. 


® The paper used in this book is acid-free and falls within the guidelines 
established to ensure permanence and durability. 
Visit the AMS home page at http: //www.ams.org/ 


10987654321 17 16 15 14 13 12 


Contents 


Preface 


Chapter 1. The Real Numbers 
11. Sets and Functions 
1.2. The Natural Numbers 
1.3. Integers and Rational Numlsers 
14. The Real Numbers 
1.5. Sup and Inf 


Chapter 2. Sequences 
2.1. Limits of Sequences 
2.2. Using the Definition of Limit 
2.3. Limit Theorems 

Monotone Sequences 


Cauchy Sequences 


lim inf and lim sup 


Chapter 3. Continuous Functions 
3.1. Continuity 
3.2. Properties of Continuous Functions 
3.3. Uniform Continuity 


3.4. Uniform Convergence 


ive 


Chapter 4. The Deriv: 
4.1. Limits of Functions 
4.2. The Derivative 


vii 


iii 


Contents 


iv 
4.3. The Mean Value Theorem 89 
4.4. L’Hopital’s Rule 93 

Chapter 5. The Integral 101 
5.1. Definition of the Integral 101 
5.2. Existence and Properties of the Integral 108 
5.3. The Fundamental Theorems of Calculus 14 
5.4. Logs, Exponentials, Improper Integrals 120 

Chapter 6. Infinite Series 129 
6.1. Convergence of Infinite Series 129 
6.2. Tests for Convergence 134 
6.3. Absolute and Conditional Convergence 14e 
6.4. Power Series 146 
6.5. Taylor’s Formula 153 

Chapter 7. Convergence in Euclidean Space 161 
7.1. Euclidean Space 161 
7.2. Convergent Sequences of Vectors 168 
7.3. Open and Closed Sets 174 
7.4. Compact Sets 179 
7.5. Connected Sets 184 

Chapter 8. Functions on Euclidean Space 191 
8.1. Continuous Functions of Several Variables 191 
8.2. Properties of Continuous Functions 197 
83. Sequences of Functions 202 
8.4. Linear Functions, Matrices 207 
8.5. Dimension, Rank, Lines, and Planes 215 

Chapter 9. Differentiation in Several Variables 223 
9.1. Partial Derivatives 223 
9.2. The Differential 229 
9.3. The Chain Rule 236 
9.4. Applications of the Chain Rule 242 
9.5. Taylor’s Formula 251 
9.6. The Inverse Function Theorem 260 
9.7. The Implicit Function Theorem 266 

Chapter 18. Integration in Several Variables 275 
10.1. Integration over a Rectangle 275 
10.2. Jordan Regions 282 


Contents 


10.3. The Integral over a Jordan Region 
10.4. Iterated Integrals 
10.5. The Change of Variables Formula 


Chapter 11. Vector Calculus 
LLL. 1-forms and Path Integrals 
11.2. Change of Varialeles 
11.3. Differential Forms of Higher Order 
11.4. Green's Theorem 
11.5. Surface Integrals and Stokes’s Theorem 
11.6. Gauss's Theorem 


1L7. Chains and Cycles 


Appendix. Degrees of Infinity 
A.L. Cardinality of Sets 
A.2. Countable Sets 
A.3. Uncountable Sets 
AA. The Axiom of Choice 


Bibliography 


Index 


378 
380 
382 
387 
389 


Preface 


This text evolved from notes developed for use in a two-semester undergraduate 


course on foundations of anal: at the University of Utah. The course is designed 


for students who have completed three semesters of calculus and one semester of 
linear algebra. For most of them, this is the first mathematics course in which 
everything is proved rigorously and they are expected to not only understand proofs 
but to also create proofs. 

The course has two main goals. The first is to develop in students the mathe- 
matical maturity and sophistication they will need when they move on to senior or 
graduate level mathematics courses. The second is to present a rigorous develop- 
ment of the calculus, beginning with a study of the properties of the real number 
system. 

We have tried to present this material in a fashion which is both rigorous and 
concise, with simple, straightforward explanations. We feel that the modern ten- 
dency to expand textbooks with ever more material, excessive explanation, and 
more and more bells and whistles simply gets in the way of the student’s under- 
standing of the basic material. 


The exercises differ widely in level of abstraction and level of difficulty. They 
vary from the simple to the quite difficult and from the computational to the the- 
oretical. There are exercises that ask students to prove something or to construct 
an example with certain properties. There are exercises that ask students to apply 
theoretical material to help do a computation or to solve a practical problem. Each 
section contains a number of examples designed to illustrate the material of the 
section and to teach students how to approach the exercises for that section. The 
text uses the following convention when referring to exercises: Exercise 1.1.5 is the 
fifth exercise in Exercise Set 1.1. 


This text, in its various incarnations, has been used by the author and his 
colleagues for several years at the University of Utah. Each use has led to improve- 
ments, additions, and corrections. 


vii 


viii Preface 


The topics covered in the text are quite standard. Chapters 1 through 6 focus 
on single varialle calculus and are normally covered in the first semester of the 
course. Chapters 7 through 11 are concerned with calculus in several variables and 
are normally covered in the second semester. 


Chapter 1 begins with a section on set theory. This is followed by the in- 
troduction of the set of natural numbers as a set which satisfies Peano’s axioms. 
Subsequent sections outline the construction, beginning with the natural numbers, 
of the integers, the rational numisers, and finally the real numbers. This is only an 
outline of the construction of the reals beginning with Peano’s axioms and not a 
fully detailed development. Such a development would le much too time consuming 
for a course of this nature. What is important is that, by the end of the chapter: 
(1) students know that the real number system is a complete, Archimedean, or- 
dered field; (2) they have some practice at using the axioms satisfied by such a 
system; and (3) they understand that this system may be constructed beginning 
with Peano’s axioms for the counting numbers. 


Chapter 2 is devoted to sequences and limits of sequences. We feel sequences 
provide the best context in which to first carry out a rigorous study of limits. The 
study of limits of functions is complicated by issues concerning the domain of the 
function. Furthermore, one has to struggle with the student’s tendency to think 
that the limit of f() as « approaches @ is just a pedantic way of describing f(a). 
These complications don’t arise in the study of limits of sequences. 


Chapter 3 provides a rigorous study of continuity for real-valued functions of 
one variable. This includes proving the existence of minimum and maximum values 
for a continuous function on a closed bounded interval as well as the Intermediate 
Value Theorem and the existence of a continuous inverse function for a strictly 
monotone continuous function. Uniform continuity is discussed, as is uniform con- 
vergence for a sequence of functions. 


The derivative is introduce in Chapter 4 and the main theorems concerning the 
derivative are proved. These include the Chain Rule, the Mean Value Theorem, 
existence of the derivative of an inverse function, the monotonicity theorem, and 
L’Hopital’s Rule. 

In Chapter 5 the definite integral is defined using upper and lower Riemann 
sums. The main properties of the integral are proved here along with the two 
forms of the Fundamental Theorem of Calculus. The integral is used to define and 
develop the properties of the natural logarithm. This leads to the definition of the 
exponential function and the development of its properties. 

Infinite sequences and series are discussed in Chapter 6 along with Taylor's 
series and Taylor’s formula. 


The second half of the text begins in Chapter 7 with an introduction to d- 
dimensional Euclidean space, R¢, as the vector space of d-tuples of real numbers. 
We review the properties of this vector space while reminding the students of the 
definition and properties of general vector spaces. We study convergence of se- 
quences of vectors and prove the Belzano-Weierstrass Theorem in this context. We 
describe open and closed sets and discuss compactness and connectedness of sets 
in Euclidean spaces. Throughout this chapter and subsequent chapters we follow 


Preface ix 


a certain philosophy concerning abstract verses concrete concepts. We briefly in- 
troduce abstract metric spaces, inner product spaces, and normed linear spaces, 
but only as an aside. We emphasize that Euclidean space is the object of study in 
this text, but we do point out now and then when a theorem concerning Euclidean 
space does or does not hold in a general metric space or inner product space or 
normed vector space. That is, the course is grounded in the concrete world of R¢, 
but the student is made aware that there are more exotic worlds in which these 
concepts are important. 


Chapter 8 is devoted to the study of continuous functions between Euclidean 
spaces. We study the basic properties of continuous functions as they relate to open 
and closed sets and compact and connected sets. The third section is devoted to 
sequences and series of functions and the concept of uniform convergence. The last 
two sections comprise a review of the topic of linear functions between Euclidean 
spaces and the corresponding matrices. This includes the study of rank, dimension 
of image and kernel, and invertible matrices. We also introduce representations of 
linear er affine subspaces in parametric form as well as solution sets of systems of 
equations. 


The most important topic in the second half of the course is probably the 
study, in Chapter 9, of the total differential of a function from R® to R?. This is 
introduced in the context of affine approximation of a function near a point in its 
domain. The Chain Rule for the total differential is proved in what we believe is 
a novel and intuitively satisfying way. This is followed ly applications of the total 
differential and the Chain Rule, including the multivariable Taylor formula and the 
inverse and implicit function theorems. 


Chapter 10 is devoted to integration over Jordan regions in R“. The develop- 
ment, using upper and lower sums, is very similar to the development of the single 
variable integral in Chapter 5. Where the proofs are virtually identical to those 
in Chapter 5, they are omitted. The really new and different material here is that 
on Fubini’s Theorem and the change of variables formula. We give rigorous and 


detailed proofs of both results along with a number of applications. 


The chapter on vector calculus, Chapter 11, uses the modern formalism of 
differential forms. In this formalism, the major theorems of the subject — Green’s 
Theorem, Stokes’s Theorem, and Gauss’s Theorem ~ all have the same form. We 
do point out the classical forms of each of these theorems, however. Each of the 
main theorems is proved first on a rectangle or cube and then extended to more 
complicated domains through the use of transformation laws for differential forms 
and the change of variables formula for multiple integrals. Most of the chapter 
focuses on integration over sets in R, R2, or R° which can be parameterized by 
smooth maps from an interval, a square or a cube, er sets which can be partitioned 
into sets of this form. However, in an optional section at the end, we introduce 
integrals over p-chains and p-cycles and state the general form of Stokes’s Theorem. 


There are topics which could have len included in this text but were not. Some 
of our colleagues suggested that we include an introductory chapter or section on 
formal logic. We considered this wut decided against it. Our feeling is that logic 
at this simple level is just language used with precision. Students have been using 
language for most of their lives, perhaps not always with precision, but that doesn’t 


x Preface 


mean that they are incapable of using it with precision if required to do so. Teaching 
students to be precise in their use of the language tools that they already possess 
is one of the main objectives of the course. We do not believe that beginning the 
course with a study of formal logic would be of much help in this regard and, in 
fact, might just get in the way. 

We could also have included a chapter on Fourier series. However, we felt that 
the material that has been included makes for a text that is already a challenge 
to cover in a two-semester course. We feel it to be unrealistic to think that an 
additional chapter at the end would often get covered. In any case, the study of 
Fourier series is most naturally introduced at the undergraduate level in a course 
in differential equations. 


We have included an appendix on cardinality at the end of the text. We discuss 
finite, countable, and uncountable sets. We show that the rationals are countable 
and the reals are not. We show that given any set, there is always a set of larger 
cardinal. We also include a discussion of the Axiom of Choice and its consequences, 
although it is not used anywhere in the body of the text. 


—Say 
Chapter 1 


The Real Numbers 


This text has two goals: (1) to develop the foundations that underlie calculus and 
all of post calculus mathematics and (2) to develop students’ ability to understand 
definitions, theorems, and proofs and to create proofs of their own — that is, to 
develop students’ mathematical sophistication. 


The typical freshman and sophomore calculus courses are designed to teach 
the techniques needed to solve problems using calculus. They are not primarily 
concerned with proving that these techniques work er teaching why they work. 
The key theorems of calculus are not really proved, although sometimes proofs are 
given which rely on other reasonalsle, but unproved, assumptions. Here we will 
give rigorous proofs of the main theorems of calculus. To do this requires a solid 
understanding of the real number system and its properties. This first chapter is 
devoted to developing such an understanding. 


Our study of the real number system will follow the historical development of 
numbers: We first discuss the natural numbers or counting mumiers (the positive 
integers), then the integers, followed by the rational numisers. Finally, we discuss 
the real number system and the property that sets it apart from the rational number 
system — the completeness property. The completeness property is the missing 
ingredient in most calculus courses. It is seldom discussed, but without it, one 
cannot prove the main theorems of calculus. 


The natural numisers can be defined as a set satisfying a very simple list of 
axioms — Peano’s axioms. All of the properties of the natural numbers can be 
proved using these axioms. Once this is done, the integers, the rational numbers, 
and the real numiers can be constructed and their properties proved rigorously. 
To actually carry this out would make for an interesting but rather tedious course. 
Fortunately, that is not the purpose of this course. We will not give a rigorous 
construction of the real number system beginning with Peano’s axioms, although 
we will give a brief outline of how this is done. However, the main purpose of 
this chapter is to state the properties that characterize the real numlser system 
and develop some facility at using them in proofs. The rest of the course will be 


2 1. The Real Numbers 


devoted to using these properties to develop rigorous proofs of the main theorems 
of calculus. 


1.1. Sets and Functions 


We precede our study of the real numlers with a brief introduction to sets and 
functions and their properties. This will give us the opportunity to introduce the 
set theory notation and terminology that will be used throughout the text. 


Sets. A set is a collection of objects. These objects are called the elements of 
the set. If is an element of the set A, then we will also say that « belongs to A or 
xis in A. A shorthand notation for this statement that we will use extensively is 


ze. 


Two sets A and B are the same set if they have the same elements ~ that is, if every 
element of A is also an clement of B and every element of B is also an element of 
A. In this case, we write A = B. 


One way to define a set is to simply list its elements. For example, the statement 
A={1,2,3,4} 


defines a set A which has as elements the integers from 1 to 4. 


Another way to define a set is to begin with a known set A and define a new 
set B to be all elements x € A that satisfy a certain condition Q(r). The condition 
Q(z) is a statement alout the element « which may be true for some values of 
and false for others. We will denote the set defined by this condition as follows: 


B={re€ A: Q(x}. 
This is mathematical shorthand for the statement “B is the set of all x in A such 
that Q(x)”. For example, if A is the set of all students in this class, then we might 
define a set B to be the set of all students in this class who are sophomores. In this 
case, Q(x) is the statement “cr is a sophomore”. The set B is then defined by 


B = {ax € A: ais a sophomore}. 


Example 1.1.1. Descrilee the set (@, 3) of all real numbers greater than @ and less 
than 3 using set notation. 


Solution: In this case the statement Q(x) is the statement “@ < 2 < 3”. Thus, 
(0,3)={t@eR:@<2< 3}. 


A set B is a subset of a set A if B consists of some of the elements of A — that 
is, if each element of B is also an element of A. In this case, we use the shorthand 
notation 

BCA. 
Of course, A is a subset of itself. We say B is a proper subset of A if BC A and 
BHA. 

For example, the open interval (@, 3) of the preceding example is a proper subset 
of the set R of real numbers. It is also a proper sulset of the half-open interval 
(0, 3] — that is, (@,3) C (@, 3), but the two are not equal because the second contains 
3 and the first does not. 


1.1. Sets and Functions 3 


ANB AUB 


Figure 1.1.1. Intersection and Unien ef Two Sets. 


There is one special set that is a subset of every set. This is the empty set 0. 
It is the set with no elements. Since it has no elements, the statement that “each 
of its elements is also an element of A” is true no matter what the set A is. Thus, 
by the definition of subset, 

OcA 

for every set A. 

If A and B are sets, then the intersection of A and B, denoted AN B, is the 
set of all objects that are elements of A and of B. That is, 

AN B= {a:a€ Aandzeé€ B}. 


Similarly, the union of A and B, denoted AU B, is the set of objects which are 
elements of A or elements of B (possibly elements of both). That is, 


AUB={x:x2€Aorze B}. 


Example 1.1.2. If A is the closed interval [—1, 3] and B is the open interval (1,5), 
describe AN B and AUB. 


Solution: AN B = (1,3] and AU B = [-1,5). 


If A is a (possibly infinite) collection of sets, then the intersection and union of 
the sets in A are defined to be 


(A= {e: 2 € A for all A€ A} 


and 


UA = {xs 2 € A for some A € A}. 


Note how crucial the distinction between “for all” and “for some” is in these defi- 
nitions. 


The intersection (A is also often denoted 
()A er (As 
ACA scs 


if the sets in A are indexed by some index set S. Similar notation is often used for 
the union. 


1 1. The Real Numbers 


Example 1.1.3. If A is the collection of all intervals of the form [s,2] where 
@<s<1, find()A and UA. 


Solution: A number « is in the set 


(A= [) '2) 


se(@,1) 
if and only if 
(1.1.1) s<x<2_ forevery positive s <1. 


Clearly every « in the interval [1,2] satisfies this condition. We will show that no 
points outside this interval satisfy (1.1.1). 

Certainly an x > 2 does not satisfy (1.1.1). If « <1, then s = x/2+ 1/2 (the 
midpoint between « and 1) is a number less than 1 but greater than «, and so such 
an a also fails to satisfy (1.1.1). This proves that 


()A= (1,21. 


A number a is in the set 
U4= U 's2) 
se(@,1) 
if and only if 
(1.1.2) s<a<2~ for some positive s < 1. 
y, we will show that e 


Every such ar is in the interval (@, 2}. Conversely, in this 
interval satisfies (1.1.2). In fact, if « € [1,2], then « satisfies (1.1.2) for every s < 1. 
If x € (@,1), then « satisfies (1.1.2) for s = x/2. This proves that 


UA=@2). 
If BC A, then the set of all elements of A which are not elements of B is called 
the complement of B in A. This is denoted A \ B. Thus, 
A\ B={we€ A: ad Bh. 


Here, of course, the notation « ¢ B is shorthand for the statement “x is not an 
element of B”. 


If all the sets in a given discussion are understood to be subsets of a given 
universal set X, then we may use the notation B° for X \ B and call it simply the 
complement of B. This will often be the case in this text, with the universal set 
being the set R of real numbers or, in later chapters, real n-dimensional space R” 
for some n. 


Example 1.1.4. If A is the interval [-2,2] and B is the interval [@, 1], describe 
A\ B and the complement B¢ of B in R. 


Solution: We have 
A\ B= [-2,@)U (1,2) = {te R:-2<4<@orl<«< 2}, 


while 
BY = (—00,@)U (1,00) = {w ER: <@orl <a}. 


1.1. Sets and Functions 5 


Theorem 1.1.5. If A and B are subsets of a set X and A® and Bo are their 
complements in X, then 


(a) (AUB)? = A°n BY; and 
(») (AN BY = ACU BE. 


Proof. We prove (a) first. To show that two sets are equal, we must show that 
they have the same elements. An element of X ielongs to (AU B)° if and only if 
it is not in AU B. This is true if and only if it is not in A and it is not in B. By 
definition this is true if and only if 2 € AN BS. Thus, (AUB)? and ACN Be have 
the same elements and, hence, are the same set. 

If we apply part (a) with A and B replaced by A® and B¢ and use the fact that 
(A°)¢ = A and (B°)* = B, the result is 

(A°U Be)’ = ANB. 


Part (b) then follows if we take the complement of both sides of this identity. O 


A statement analogous to Theorem 1.1.5 is true for unions and intersections of 
collections of sets (Exercise 1.1.7). 

Two sets A and B are said to be disjoint if AN B = 0. That is, they are 
disjoint if they have no elements in common. A collection JA of sets is called a 
pairwise disjoint collection if AN B = 6 for each pair A, B of distinct sets in A. 


Functions. A function f from aset A to aset B is a rule which assigns to each 
element x € A exactly one element f(x) € B. The element f(z) is called the image 
of x under f or the value of f at x. We will write 


f:AoB 


to indicate that f is a function from A to B. The set A is called the domain of f. 
If B is any subset of A, then we write 


S(E) = (f(z): € B} 
and call f(E) the image of E under f. 

We don’t assume that every element of B is the image of some element of A. 
The set of elements of B which are images of elements of A is f(A) and is called 
the range of f. If every element of B is the image of some element of A (so that 
the range of f is B), then we say that. f is onto. 

A function f : A > B is said to be one-to-one if, whenever x,y € A and x / y, 
then f(x) / f(y) — that is, if f takes distinct points to distinct points. 

Ifg: A+ Band f : B > C are functions, then there is a function fog: A> C, 
called the composition of f and g, defined by 


fog(x) = f(g(2)). 
Since g(x) € B and the domain of f is B, this definition makes sense. 


If f: A Bisa function and E C B, then the inverse image of E under f is 
the set. 
f \(E) = {ae A: f(x) € E}. 


That is, f~1(B) is the set of all elements of A whose images under f belong to E. 


6 1. The Real Numbers 


Inverse image behaves very well with respect to the set theory operations, as 
the following theorem shows. 


Theorem 1.1.6. If f : A> B is @ function and E and F are subsets of B, then 
(a) f-M(EUF)= f-(E)Us-“M(F); 

(eo) f-(BOF) =f (b)N fF); and 

(c) FME\ FP) = f-ME)\ SF) f PCE. 


Proof. We will prove (a) and leave the other two parts to the exercises. 

To prove (a), we will show that f-1(EU F) and f-!(E)U f-1(F) have the 
same elements. If 2 € f-'(EU F), then f(x) € EU F. This means that f(z) is in 
E or in F. If it is in E, then x € f~1(B). If it is in F, then x € f-!(F). In either 
case, x € f-(E)U f-1(F). This proves that every element of f-1(EU F) is an 
element of f-!(E)U f-1(F). 

On the other hand, if « € f-1(2)U f-1(F), then x € f-1(B), in which case 
f(x) € B, or « € f-(F), in which case f(x) € F. In either case, f(x) € EU F, 
which implies « € f~'(EUF). This proves that every element of f-'(E)U f-1(F) 
is also an element of f~!(£U F). Combined with the previous paragraph, this 
proves that the two sets are equal. Oo 


Image does not behave as well as inverse image with respect the set operations. 
The best we can say is the following: 
Theorem 1.1.7. If f: A B is « function and E and F are subsets of A, then 
(a) ((EUF) = f(E)U f(F); 
(e) (ENF) Cc f(E) Of (F); 
(c) F(E)\ S(F) C S(E\ PF) Ff PCE. 


Proof. We will prove (c) and leave the other parts to the exercises. 


To prove (c), we must show that each element of f(E)\ f(F) is also an element 
of f(E\ F). Ify € f(E)\ f(F), then y = f(x) for some x € E and y is not the 
image of any element of F. In particular, « ¢ F. This means that x € E\ F and 
so y € {(E\ F). This completes the proof. a 


The above theorem cannot be improved. That is, it is not in general true that 
{(EOF) = f(E)N f(F) er that f(B)\ f(F) = f(E\ F) if FC B. The first of 
these facts is shown in the next example. The second is left to the exercises. 
Example 1.1.8. Give an example of a function f : A + B for which there are 
subsets E, FC A with f(EN F) # f(E)OS(F). 

Solution: Let A and B both be in R and let f : A> B be defined by 

f(t) =2?. 

If E = (@,00) and F = (-00,@), then ENF = 9, and so f(EM F) is also the empty 
set. However, f(E) = f(F) = (0,00), and so f(E) 9 f(F) = (@,00) as well. Clearly 
{(EO FP) and f(E)N f(F) are not the same in this case. 


1.1. Sets and Functions 7 


Cartesian Product. If A and B are sets, then their Cartesian product A x B 


is the set of all erdered pairs (@,b) with e € A and b € B. Similarly, the Cartesian 
product of n sets Ay, Ag,..., An is the set Ay x Ay X--- x A, of all ordered n-tuples 
(a1,02,--.,@n) with a; € Aj for i= 1,...,n. 


If f: A Bisa function from a set A to aset B, then the greph of f is the 


subset of A x B defined by {(a,b) € AX B: b= f(a)}. 


14. 


15. 


. If A and B are two sets, then prove that A is the union of a disj 


SSS 
Exercise Set 1.1 


. If a,b € Rand « <b, give a description in set theory notation for each of the 


intervals (a,b), (@, 5], [a,b), and (@,b] (see Example 1.1.1). 


. If A, B, and C are sets, prove that 


An (BUC) = (AN B)U(ANC). 


int pair of 
sets, one of which is contained in B and one of which is disjoint from B. 


. What is the intersection of all the open intervals containing the closed interval 


(0,1)? Justify your answer. 


. What is the intersection of all the closed intervals containing the open interval 


(0,1)? Justify your answer. 


. What is the union of all of the closed intervals contained in the open interval 


(0,1)? Justify your answer. 


. If A is a collection of subsets of a set Y, formulate and prove a theorem like 


Theorem 1.1.5 for the intersection and union of A. 


. Which of the following functions f : R > R are one-to-one and which ones are 


onto. Justify your answer. 
(a) f(x) = 2°; 

(b) f@) = 23; 
() f(z) =e. 


. Prove part (b) of Theorem 1.1.6. 
. Prove part (c) of Theorem 1.1.6. 
. Prove part (a) of Theorem 1.1.7. 
. Prove part (b) of Theorem 1.1.7. 


. Give an example of a function f : A > B and subsets F C E of A for which 


S(E)\ f(P) # f(E\ F)- 

Prove that equality holds in parts (b) and (c) of Theorem 1.1.7 if the function 
f is one-to-one. 

Prove that if f : A > B is a function which is one-to-one and onto, then f 


has an inverse function ~ that, is, there is a function g: B > A such that 
(f(x) = x for all € Aand f(g(y)) = y for all ye B. 


8 1. The Real Numbers 


16. Prove that a subset G of A x B is the graph of a function from A to B if and 
only if the following condition is satisfied: for each a € A there is exactly one 
be B such that (a,b) € G. 


1.2. The Natural Numbers 


The natural numbers are the numbers we use for counting, and so, naturally, they 


3 


The requirements for a system of numbers we can use for counting are very 
simple. There should be a first number (the mumber 1), and for each number there 
must always be a next number (a successor). After all, we don’t want to run out of 
numbers when counting a large set of objects. This line of thought leads to Peano’s 
axioms, which characterize the system of natural numbers N: 


are also called the counting numbers. They are the positive integers 1, 


N1. There is an element 1 € N. 
N2. For each n € N there is a successor element s(n) € N. 
N3. 1 is not the successor of an element of N. 


NA. If two elements of N have the same suc 


or, then they are equal. 


N5. Ifasubset A of N contains 1 and is closed under succession (meaning s(n) € A 
whenever n € A), then A = N. 


Note: At this stage in the development of the natural number 5) 
we have are Peano’s axioms; addition has not yet been defined. When we define 
addition in N, $(n) will turn out to be n+ 1 


stem, all 


Everything we need to know about the natural numisers can be deduced from 
these axioms. That is, using only Peano’s axioms, one can define addition and 
multiplication of natural numbers and prove that they have the usual arithmetic 
properties. One can also define the order relation on the natural numisers and 
prove that it has the appropriate properties. To do all of this is not difficult, but 
it is tedious and time consuming. We will do some of this here in the text and 
the exercises, but we won't do it all. We will do enough so that students should 


understand how such a development would proceed. Then we will state and discuss 
the important properties of the resulting system of natural numbers. 

Our main tool in this section will be mathematical induction, a powerful tech- 
nique that is a direct consequence of axiom N5. 


Induction. Axiom N5 above is often called the induction axiom, since it is the 
basis for mathematical induction. Mathematical induction is used in making defi- 
nitions that involve a sequence of objects to be defined and in proving propositions 
that involve a sequence of statements to ee proved. Here, by a sequence we mean a 
function whose domain is the natural numbers. Thus, a sequence of statements is 
an assignment of a statement to each n € N. For example, “n is either 1 or it is the 
successor of some element of N” is a sequence of statements, one for each n € N. 
We will use induction to prove that all of these statements are true once we prove 
the next theorem. 


1.2. The Natural Numbers 9 


The following theorem states the mathematical induction principle as it applies 
to proving propositions. 


Theorem 1.2.1. Suppose {Pn} is @ sequence of statements, one for cach n € N. 
These statements are all true provided 


(1) Pi is true (the base case is true); and 


(2) whenever P, is true for some n €N, then Py,) is also true (the induction 
step can be carried out). 


Proof. Let A be the subset of N consisting of those n for which P, is true. Then 
hypothesis (1) of the theorem implies that 1 € A, while hypothesis (2) implies that 
s(n) € A whenever n € A. By axiom N5, A =N, and so Py is true for every n. O 


Example 1.2.2. Prove that each n € N is either 1 er is the successor of some 
element of N. 

Solution: If n is 1, then the statement is obviously true. Thus, the base case 
is true. If the statement is true of n, then it is certainly true of s(n), because it is 
true of any element which is the successor of something in N. Thus, ly induction, 
the statement is true for every n € N. 


Another way to say what was proved in this example is that every natural 
number except 1 has a predecessor. This statement doesn’t seem olvious at this 
stage of development of N, but its proof was a rather trivial application of induction. 


Inductive Definitions. Inductive definitions are used to define sequences. The 
sequence {«,} to be defined is a sequence of elements of some set X, which may or 
may not be a set of numbers. We wish to define the sequence in such a way that 
x is a specified element of X and, for each n EN, z,(,) is a certain function of xy. 
That is, we are given an element 21 € X and a sequence of functions fy, :X + X 
and we wish to construct a sequence {a,}, beginning with x1, such that 


(1.2.1) Te(n) = fn(en) for all n EN. 


This equation, defining «4(,) in terms of xp, is called a recursion relation. Se- 
quences defined in this way occur very often in mathematics. Newton’s method 
from calculus and Euler’s method for numerically solving differential equations are 
two important examples. 


Theorem 1.2.3. Given a set X, an element 1; € X, and a sequence {fn} of 
functions from X to X, there is « unique sequence {xp} in X, beginning with «1, 
which satisfies x4(n) = fn(&n) for alln EN. 


Proof. Consider the Cartesian product N x X — that is, the set of all ordered pairs 
(n, x) with n € N and « € X. We define a function S: N x X + Nx X by 


(1.2.2) S(n, x) = (s(n), fala). 

We say that a subset E of N x X is closed under $ if S$ sends elements of E to 
elements of E. Clearly the intersection of all subsets of N x X that are closed 
under $ and contain (1,1) is also closed under $' and contains (1,21). This is the 
smallest subset of N x X that is closed under $ and contains (1,.r1). We will call 


10 1. The Real Numbers 


this set A. Note that (1,1) is the only element of A which is not in the range of S. 
This is because any other such element could be removed from A and the resulting 
set would still contain (1,21) and be closed under S. 


To complete the argument, we will show that the set A is the graph of a function 
from N to X — that is, it has the form {(n,2,) :n € N} for a certain sequence 
{an} in X. This is the sequence we are seeking. To prove that A is the graph of a 
function from N to X’, we must show that each n € N is the first element of exactly 
one pair (n,.) € A. We prove this by induction. 

The element 1 is the first element of the pair (1,1), which is in A by the 
construction of A. If there were another element « € X such that (1,2) € A, then 
(1,2) would be an element, not equal to (1,11), which fails to be in the range of S. 
This is due to the fact that 1 is not the successor of any element of N by N3. 


Now, for the induction step, suppose for some n we know that there is a unique 
element rz, € X such that (n,t,) € A. Then S(n,an) = (s(n), fn(an)) is in A. 
Suppose there is another element (s(n),r) € A with x # fy(xn) and suppose this 
element is in the image of S — that is, (s(n), x) = S(m,y) = (s(m), fm(y)) for some 
(m,y) € A. Then n = m by N4, and y = ap by the induction assumption. Thus 
if (s(n), x) is really different from (s(n), fn(xn)), then it cannot be in the image of 
S. Since (1,.r1) is the only element of A which is not in the image of S and since 
s(n) # 1, we conclude there is no such element (s(n), x). By induction, for each 
element of N there is a unique element x, € X such that (n,ap) € A. Thus, A is 
the graph of a function n + 2, from N to X. 

This shows the existence of a sequence with the required properties. We leave 
the proof that this sequence is unique to the exercises. ia) 


Note that the proof of the above theorem used all of Peano’s axioms, not just 
N5. 


Using Peano’s Axioms to Develop Properties of N. In this subsection, we 
will demonstrate some of the steps involved in developing the arithmetic and erder 
properties of N using only Peano’s axioms. It is not a complete development, but 
just a taste of what is involved. We begin with the definition of addition. 


Definition 1.2.4. We fix m € N and define a sequence {m+n}, inductively as 
follows: 


(1.2.3) m+1=s(m), 
and 
(1.2.4) m+ s(n) = (m+n). 


These two conditions determine a unique sequence {m +n}en by Theorem 1.2.3. 
Note that (1.2.4) is the recursion relation of the inductive definition. It tells us how 
m + s(n) is to be defined assuming that m +n has already been defined. 


By (1.2.3) of the above definition, the successor s(n) of n is our newly defined 
n+ 1. At this point we will begin using n + 1 in place of s(n) in our inductive 
arguments and definitions. 


1.2. The Natural Numbers 1 


Example 1.2.5. Using the alsove definition and Peano’s axioms, prove the asso- 
ciative law for addition in N. That is, prove 


m+(n+k)=(m+n)+k forall k,n,meéeN. 
Solution: We fix m and n and, for each k € N, let Py be the proposition 


m+ (n+k) =(m+n)+k. We prove that P, is true for all k € N by induction on 
k. 


The base case P; is just 
(1.25) m+ (n+1)=(m+n)+1, 


which is the recursion relation (1.2.4) used in the definition of addition once we 
replace s(n) with n+ 1. Thus, Pi is true ey definition. 


For the induction step, we assume P, is true for some k — that is, we assume 
m+(n+k)=(m+n)+k. 
We then take the successor of both sides of this equation to obtain 
(m+ (n+k))+1= ((mtn)+k) +1. 
If we use (1.2.5) on both sides of this equation, the result is 
m+ ((n+k) +1) = (m+n) + (k +1). 
Using (1.2.5) again, this time on the left side of the equation, leads to 
m+ (n+(k+1)) = (m+n) + (k +1). 
Since this is proposition Py,1, the induction is complete. 
Example 1.2.6. Using Definition 1.2.4 and Peano’s axioms, prove that 1+n = n+1 
for every n € N. 
Solution: Let P, be the statement 1+n = n-+1. We prove by induction that 
P,, is true for every n. It is trivially true in the base case n = 1, since P; just says 
14+1=141. 


For the induction step, we assume that P,, is true for some n — that is, we 
assume 1 +n = n+ 1. If we add 1 to both sides of this equation (ie. take the 
successor of both sides), we have 


(l+n)+1=(n+1)4+1. 
By Definition 1.2.4, the left side of this equation is equal to 1 + (n +1). Thus, 
1+(n+1)=(n+1)+1. 
Thus, P41 is true if P, is true and the induction is complete. 
A similar induction, this time on m, with n fixed can be used to prove the 
commutative law of addition — that is, m+n = n+ m for all n,m €N. The base 
case for this induction is the statement proved alsove. The associative law proved 


in Example 1.2.5 is needed in the proof of the induction step. We leave the details 
to the exercis' 


We leave the definition of multiplication in N to the exercises. Its definition 
and the fact that it also satisfies the associative and commutative laws follow a 


12 1. The Real Numbers 


pattern similar to the one above for addition. Once multiplication is defined, we 
can define factors and prime numbers: 


Definition 1.2.7. If a number n € N can be written as n = mk with both m €N 
and k EN, then k and m are called factors of n and are said to divide n. If n / 1 
and the only factors of n are 1 and n, then n is said to be prime. 


The erder relation in N can be defined as follows: 


Definition 1.2.8. If n,m €N, we will say that n is less than m, denoted n < m, 
if there is a k € N such that m= n+ k. We say nis less than or equal to m and 
writen < mifn<morn=m. 


Some of the properties of this order relation are worked out in the exercises. 
One of these is that each factor of n is necessarily less than or equal to n (Exercise 
1.2.7). 


Example 1.2.9. Prove that each natural number n > 1 is a product of primes. 
Solution: 

primes — a product with only one factor. Note that if k and m are two numbers 

which are products of primes, then their product km is also a product of primes. 


Here we understand that a prime mumber itself is a product of 


Let the proposition P, be that every m € N, with 1 <m <n, is a product of 
primes. 


Base case: P; is true because there is no m € N with 1<m< 1. 


Induction step: Suppose n is a natural number for which P, is true. Then each 
m with 1 < m < nis a product of primes. Now n+ 1 > 1 and so it is either a 
prime er it factors as a product km with k and m not equal to 1 er n+ 1. In the 
first case, Pps is true. In the second case, both k and m are less than n + 1 and, 
hence, less than er equal to n. Since Py is true, k and m are products of primes. 
This implies that n+ 1 = km is also a product of primes and, in turn, this implies 
that Pasi is true. 

By induction, P, is true for all n € N and this means that every natural number 
n> Lis a product of primes. 


Additional Examples of the Use of Induction. At this point we leave the 
discussion of Peano’s axioms and the development of the properties of the natural 
numbers. The remainder of the section is devoted to further examples of inductive 
proofs and inductive definitions. Some of these involve the real number system, 
which won’t be discussed until Section 1.4. Nevertheless we are happy to anticipate 
its development and use its properties in these examples. 
Example 1.2.10. Prove by induction that every number of the form 5” — 2”, with 
n€N, is divisible by 3. 

Solution: The proposition P,, is that 5" — 2” is divisible by 3. 
Since 5 — 2 = 3, P is true. 


Base cas 
Induction step: We need to show that P,,1 is true whenever P,, is true. We 
do this by rewriting the expression 5"+? — 9"+1 as 


srt _ 5.0" 4.5.9" 9mtl — 5(5" 9") 4 (5 —2)9". 


1.2. The Natural Numbers 13 


If P, is true, then the first term on the right is di 
the right is also divisible by 3, since 5— 2 = 3. This implies that 5"+1 — 2n+1 js 
divisible by 3 and, hence, that P,,1 is true. This completes the induction step. 


isible by 3. The second term on 


By induction (that is, by Theorem 1.2.1), Py is true for all n. 


Example 1.2.11. Define a sequence {2,,} of real numbers by setting 2; = 1 and 
using the recursion relation 


(1.2.6) Int = VEnt 1. 

Show that this is an increasing sequence of positive numbers less than 2. 
Solution: The function f (2) = Vz +T may be regarded as a function from 

the set of positive real numbers into itself. We can apply Theorem 1.2.3, with each 

of the functions f, equal to f, to conclude that a sequence {x,} is uniquely defined 

by setting x, = 1 and imposing the recursion relation (1.2.6 


Let P,, be the proposition that 2, <2ty41 <2. We will prove that P, is true 
for all n by induction. 


Base case: P; is the statement x, < ry < 2. Since 4, = | and zy = V2, this is 
true. 


Induction step: Suppose Pp is true for some n. Then tp <tni1 <2. If we 
add one and take the square root, this becomes 


Trt1< Vani +1 < V3. 


Using the recursion relation (1.2.6), this yields 
Ent <In42 < V3. 


Since V3 < 2, Pps is true. This completes the induction step. 


We conclude that P, is true for all n € N. 


Binomial Formula. The proof of the binomial formula is an excellent example 


This is the number of ways of choosing k olejects from a set of n objects. 


of the use of induction. 


We will use the notation 


Theorem 1.2.12. [fz and y are real numbers andn€ N, then 


wr =3 (0) er 
k=0 


Proof. We prove this by induction on n. 


Base cas 


nce () and (1) ate both:iy:the binomial formula is‘trdé when 


n= 1, 


it 1. The Real Numbers 


Induction step: If we assume the formula is true for a certain n, then multiplying 
both sides of this formula by x + y yields 


Grant a2d- (ia ayn Nes es hyn 
Loew Eger 


If we change variables in the first sum on the second line of (1.2.7) by replacing k 
ey &— 1, then our expression for (x + y)"*! becomes 


n 
n 
athe (Es ahyt Poole yan aghyn-btl 4 yntl 
k=1 


n 
arth [zt + (2) ahyP tok po yt 
k=l 


If we use the identity (to be proved in Exercise 1.2.17) 


(°1)+@)=(*2"), 


then the right side of equation (1.2.8) becomes 


n ntl 
a, n+1 n+ rk 7 
grth ~( )" ying ymt — > ( ee 
k k 
k=l 


k=0 


(1.2.7) 


(1.2.8) 


Thus, the binomial formula is true for n+ 1 if it is true for n. This completes the 
induction step and the proof of the theorem. a 


Exercise Set 1.2 


In the first seven exercises use only Peano’s axioms and results that were proved in 


Section 1.2 using only Peano’s axioms. 


1. Prove that the commutative law for addition, n +m = m+n, holds in N. Use 
induction and Examples 1.2.6 and 1.2.5. 


2. Prove that if n,m €N, then m+n n. Hint: Use induction on n. 


3. Use the preceding exercise to prove that if n,m €N, n<m, and m <n, then 
n=. This is the reflexive property of an erder relation. 


4. Prove that the order relation on N has the transitive property: if k <n and 
n<m, then k <m. 


a 


. Use the preceding exercise and Peano’s axioms to prove that ifn € N, then for 
each element m € N either m <n orn <m. Hint: Use induction on n. 
6. Show how to define the product nm of two natural numbers. Hint: Use induc- 
tion on m. 
7. Use the definition of product that you gave in the preceding exercise to prove 
that ifn,m EN, then n < nm. 


1.2. The Natural Numbers 15 


For the remaining exercises you are no longer restricted to just using Peano’s axioms 


and their immediate consequences. 


8. 


9. 


10. 


IL: 


13. 


1d. 


15. 


16. 


IT: 


18. 
19. 


Using induction, prove that 7” — 2” is divisible by 5 for every n € N. 


eae f — n(n+1 
Using induction, prove that SS k= oS for every n EN. 
k= 
Using induction, prove that }°(2k— 1) = n? for every n € N. 
k=1 
Finish the prove of Theorem 1.2.3 by showing that there is only one sequence 
{xn} which satisfies the conditions of the theorem. 


. If 21 is chosen so that @ < x; < 2 and zy is defined inductively by tp41) = 


Vz, +2, then prove by induction that @< a, < any1 <2 for alln EN. 
Let a sequence {x,,} of numbers be defined recursively by 


n+l 
Te 


ay =@ and tayy= 


Prove by induction that x, < t+) for all n € N. Would this conclusion change 


if we set x1 = 2? a 

Let a sequence {x} of numbers be defined recursively by 
a =1 and tas = 

Prove by induction that 42 is eetween a, and a,41 for each n EN. 


Mathematical induction also works for a sequence Px, Pei1,--. of propositions, 
indexed by the integers n > k for some k € N. The statement is: if Px is true 
and Py41 is true whenever P, is true and n> k, then P, is true for all n > k. 
Prove this. 


Use induction in the form stated in the preceding exercise to prove that n? < 2" 


for all n> 5. 
Prove the identity 
n n n+l 
(e2,)+()-(Ce'): 
which was used in the proof of Theorem 1.2.12. 
Write out the binomial formula in the case n = 4. 


Prove the well ordering principal for the natural numbers: each non-empty 
subset S$ of N contains a smallest element. Hint: Apply the induction axiom to 
the set 

T = {n€N:n<mborallme S}. 


. Use the result of Exercise 1.2.19 to prove the division algorithm: if n and m 


are natural numbers with m <n and if m does not divide n, then there are 
natural numbers q and r such that n = qm +r andr < m. Hint: Consider the 
set $ of all natural numbers s such that (s+ 1)m>n. 


16 1. The Real Numbers 


1.3. Integers and Rational Numbers 


The need for number systems larger than the natural numbers became apparent 
early in mathematical history. We need the number @ in order to describe the 
number of elements in the empty set. The negative numlers are needed to describe 
deficits. Also, the operation of subtraction leads to non-positive integers unless 
n— mis to be defined only for m <n. 


Beginning with the system of natural numbers N and its properties derivable 
from Peano’s axioms, the system of integers Z can easily be constructed. One sim- 
ply adjoins to N a new element called @ and for each n € N a new element called —n. 
Of course, one then has to define addition and multiplication and an order relation 
“<” for this new set Z in a way that is consistent with the existing definitions of 
these things for N. When addition and multiplication are defined, we want them to 
have the properties that 0+-n = n and n+(—n) = @. It turns out that these require- 
ments and the commutative, associative, and distributive laws (described below) 
are enough to uniquely determine how addition and multiplication are defined in 
Z. 

When all of this has been carried out, the new set of numbers Z can be shown 
to be a commutative ring, meaning that it satisfies the axioms listed below. 


The Commutative Ring of Integers. A binary operation on a set A is a rule 
which assigns to each ordered pair (e,b) of elements of A a third element of A. 


Definition 1.3.1. A commutative ring is set R with two binary operations, ad- 
dition (a,b) + « +b) and multiplication (a,b) + ab), that satisfy the following 
axioms: 

Al. (Commutative Law of Addition) «+ y= y +a for all z,y€ R. 

A2. (Associative Law of Addition) «+ (y+ 2) =(a+y) +2 for all ,y,2€ R. 
A3. (Additive Identity) There is an element @ € R such that @+ 2 = = for all 


TER. 
A4, (Additive Inverses) For each « € R, there is an element —a such that x + 
(-«) =0. 


M1. (Commutative Law of Multiplication) ay = yz for all x,y € R. 

M2. (Associative Law of Multiplication) (yz) = (wy)z for all x,y,z € R. 

M3. (Multiplicative Identity) There is an element 1 € R such that 1 / @ and 
le =x for all ze R. 


D. (Distributive Law) 2(y +z) = cy + xz for all z,y,2 € R. 
A large number of familiar properties of numbers can be proved using these 


axioms, and this means that these properties hold in any commutative ring. We 
will prove some of these in the examples and exercises. 


Example 1.3.2. If F is a commutative ring and «,y, z € F, prove that 
(a) c+z2=y4zimplies x= y; 
(e) x-0=0; 
(c) (-#)y = —ay. 


1.3. Integers and Ratienal Numbers 17 


Solution: Suppose x + z= y+ z. On adding —z to both sides, this becomes 
(x + 2) + (—z) = (y+ 2) + (2). 
Applying the associative law of addition (A2) yields 
w+ (z+(-z)) =y+ (z+ (-2)). 


But (z + (—z)) =@ by Ad and 2 +@= <a by AB and AL. Similarly, y+ @=y. We 
conclude that a = y. This proves (a). 


By A3,@+@=@. By D and A3, 
a Ob a Oa: (O+ 


=c-0=0+2-8. 


Using (a) above, we conclude that a -@= 0. 

To prove (c), we first note that, by definition, xy is the additive inverse of 
ry (it follows from (a) that there is only one of these). We will show that (—a)y is 
also an additive inverse for xy. By D, (b), and A1, 


zy + (—a)y = («+ (-2))y=O-y=6. 
This proves that (—)y is an additive inverse for zy and, hence, it must be —ry. 


Subtraction in a commutative ring is defined in terms of addition and the 
additive inverse by setting 
a-y=at(-y). 


The system of integers satisfies all the laws of Definition 1.3.1, and so it is a 
commutative ring. In fact, it is a commutative ring with an order relation, since 
the order relation on N can be used to define a compatible order relation on Z. 
However, Z is still inadequate as a number system. This is due to our need to talk 
about fractional parts of things. This defect is fixed ly passing from the integers 
to the rational numbers. 


The Field of Rational Numbers. A field is a commutative ring in which divi- 
sion is possible as long as the divisor is not @. That is, 


Definition 1.3.3. A field is a commutative ring satisfying the additional axiom: 


Ma. (Multiplicative Inverses) For each non-zero element x there is an element «~! 


such that la = 1. 


In a field, an element y can we divided by any non-zero element x. The result 
is e~!y, which can also be written as y/x er 4. 
The rational number system Q is a field that is constructed directly from the 


integers. The construction begins by considering all symbols of the form 4, with 


n,m € Z and m 7 @. We identify two such symbols % and whenever nq = mp. 


The resulting object is called a fraction. Thus, 4 and 3 represent the same fraction 


because 4-3 = 6-2. The set Q is then the set of all fractions. 
Addition and multiplication in Q are defined in the familiar way: 
n 4b _ ng +mp nal n Pp = nD 
mq mq m q = mq 


18 1. The Real Numbers 


A fraction of the form is identified with the integer n. This makes the set of 
integers Z a subset of Q. 


The above construction yields a system that satisfies Al through A4, M1 
through M4, and D. It is therefore a field. We call it the field of rational numbers 
and denote it by Q. We won't prove here that Q satisfies all of the field axioms, 
ut a few of them will be verified in the examples and exercises of this section. We 
will also use the examples and exercises to show how the field axioms can be used 
to prove other standard facts about arithmetic in fields such as Q. 


Example 1.3.4. Assuming that Z satisfies the axioms of a commutative ring, 
verify that Q satisfies A3 and M3. 


Solution: The additive identity in Z is the integer @, which is identified with 


the fraction &. If we add this to another fraction 4, the result is 


@ n O-mtil-nion 
ton = Tene im, 
Thus, @ = $ is an additive identity for Q and axiom A3 is satisfied. 


The multiplicative identity in Z is the integer 1, which is identified with the 


fraction +, If we multiply this by another fraction 4, the result is 


1. om len n 


lm oi:m om 


Thus, 1 = t is a multiplicative identity for Q and axiom M$ is satisfied. 


Example 1.3.5. Verify that Q satisfies M4. 
Solution: We know that the elements of Q of the form * represent the zero 


element of Q. Thus, each non-zero element is represented wy a fraction ® in which 
n 7 @ Then ® is also a fraction, and 


mnonmiil 1 
nom nm 1 P 


Thus, “ is a multiplicative inverse for 2. This proves that M4 is satisfied in Q. 
The Ordered Field of Rational Numbers. Using the order relation on the 

integers, it is easy to define an order relation on Q. Ifr is an element of Q, then we 
declare r > @ if r can we represented in the form 2 for integers n > @ and m > @. 
The order relation is then defined by declaring 

P<” ifandonlyif "—2>0. 

qm mq 
With the erder relation defined this way, Q becomes an erdered field. That is, it 
satisfies the axioms in the following definition. 


Definition 1.3.6. A field F is called an ordered field if it has an order relation 
“<” such that the following are satisfied for all x,y,z € F: 

Ol. Either 2 <yory<c. 

02. Ifa <yandy <a, then c= y. 

O3. Ifa <yandy<z, thene <2. 


1.3. Integers and Rational Numbers 19 


O4. Ifa<y, thena+2<y +z. 
O5. Ifa < y and 0 < z, then wz < yz. 
Remark 1.3.7. Given an order relation “<”, we don’t distinguish between the 


statements “rx < y” and “y > 2” — they mean the same thing. Also, if « < y and 
x /y, then we write x < y er, equivalently, y > x. 


Example 1.3.8. Prove that if F is an ordered field, then 


(a) ifa,y € F and x< y, then -y < —2; 
(») ifa€ F, then 2? > @: 
(c) @<1 


Solution: If x < y, then 0 = a—a <y—a by O4. Using O4 again, along 
with Al through A4, yields —y < (y— x) — y =—2. This completes the proof of 
(a). 

By O1, if « € F, then 0 < vera <0. If 0 < a, then we multiply this 
inequality by x and use O4 to conclude that 0 < «?. On the other hand, suppose 
x <0. Then, by part (a), 0 < —a. As above, we conclude that 0 < (—)?. Since 
x)? = 22 (Exercise 1.3.5), the proof of part () is complete. 

Since 1? = 1, part (1) implies that 0 < 1. By M3, 1 / 0 and so 0 <1. 


Defects of the Rational Field. The rational number system is very satisfying 
in many ways and is highly useful. However, there are real-world mathematic 
problems that appear to have real-world numerical solutions, but these solutions 
cannot be rational numbers. For example, the Pythagorean Theorem tells us that if 
the legs of a right triangle have length « and b, then the length ¢ of the hypotenuse 
satisfies the equation 

C=a0+P. 
However, there are many examples of rational and even integer choices for « and 
6 such that this equation has no rational solution for c. The simplest example is 
a= b=1. The Pythagorean Theorem says that a right triangle with legs of length 
1 has a hypotenuse of length c satisfying c? = 2. However, there is no rational 
number whose square is 2. We will prove this using the following theorem: 


Theorem 1.3.9. If k is an integer and the equation x? = k has @ rational solution, 
then that solution is actually an integer. 


Proof. Suppose r is a rational number such that r? = k. Let r = % ber expressed 
as a fraction in which n and m have no common factors. Then, 


4 2 
n Bec: 
(+) =k andso n?=mi 


This equation implies that m divides n2. However, if m / 1, then m can be 
expressed as a product of primes, and each of these primes must also divide n2. 
However, if a prime number divides n2, it must. also divide n (Exercise 1.3.14). 
Thus, each prime factor of m divides n. Since n and m have no common factors, 
this is impossible. We conclude that m = 1 and, hence, that r = nis an integer. 0 


20 1. The Real Numbers 


Now it is easy to see that 2 is not the square of a rational number. If it 
were, that numier would have to be an integer, by the above theorem. The only 
possibilities are —1,0,1 since all other integers have squares that are too large. Of 
course, none of the numbers —1,0, 1 has its square equal to 2. 

Other geometric objects also lead to the conclusion that the system of rational 
numbers is not sufficient for the measurement of objects that occur in the natural 
world. The area m of a circle of radius 1 is not a rational number, for example. 
In fact, the rational number system is riddled with holes where there ought to be 
numbers. This problem is fixed ey the introduction of the system of real numbers 
which is the topic of the next section. 


SSS Se 
Exercise Set 1.3 


1. Given that N has an operation of addition which is commutative and associa- 
tive, how would you define such an addition operation in Z? 


2. Referring to the previous exercise, answer the same question for the operation 
of multiplication. 


3. Prove that if Z satisfies the axioms for a commutative ring, then Q satisfies Al 
and M1. 


4. Prove that if Z satisfies the axioms for a commutative ring, then Q satisfies A2 


and M2. 


In the next three exercises you are to prove the given statement assuming x,y, z 
are elements of a field. You may use the results of examples and theorems from 
this section. 

5. (—2)(-y) = ay. 

6. xz = yz implies x = y, provided z 7 0. 

7. cy =0 implies « = 0 or y=0. 
In the next three exercises you are to prove the given statement assuming 2, y, z 
are elements of an ordered field. Again, you may use the results of examples and 
theorems from this section. 

8. c >0 and y >0 imply zy > 0. 

9. « >0 implies x~! > 0. 
10. 0<a2<y implies y! <a}, 
11. Prove that the equation x? = 5 has no rational solution. 


12. Generalize Theorem 1.3.9 by proving that every rational solution of a polyno- 
mial equation 
a 4 ay a” +--+ aye + ae =0, 
with integer coefficients @,, is an integer solution. 
13. Prove that if m and n are positive integers with no common factors other than 
1 (ie. m and n are relatively prime), then there are integers « and b such that 
1=am-+bn. Hint: Let 5 be the set of all positive integers of the form am-+bn, 


1.4, The Real Numbers 21 


where « and b are integers. This set has a smallest element by Exercise 1.2.19. 
Use the division algorithm (Exercise 1.2.20) to show that this smallest element. 
divides both m and n. 


14. Use the result of the preceding exercise to prove that if a prime p divides the 
product nm of two positive integers, then it divides n or it divides m. 


1.4. The Real Numbers 


As pointed out in the previous section, the set of rational numbers is riddled with 
“holes” where there ought to be numbers. Here we will try to make this statement 
more precise and then indicate how these holes can be “filled”, resulting in the 
system of real numbers. In addition to the ordered field axioms, the real number 
stem satisfies a new axiom C — the completeness axiom. Later in the section we 


will state it and explore its consequences. 


The construction of the real numbers that we outline below is motivated by the 
idea that a “hole” in the rational numbers is a location along the rational number 
line where there should be a number but there is no rational number. What do we 
mean by a “location” along the rational number line? Well, if this has meaning, 
then it should make sense to talk about the rational mumbers that are to the left 
of this location and those that are to the right of this location. This should lead to 
a separation of the rational numbers into two sets — one to the left and one to the 
right of the given location. In fact, we can define a location on the rational line to 
be such a separation. This leads to the notion of a Dedekind cut. 


Dedekind Cuts. If r is a rational number, consider the infinite interval L, 
consisting of all rational numbers to the left of r. That is, 


(1.4.1) L,={w€Q:a<r}. 


This set is a non-empty, proper subset of Q. It has no largest element, since, for 
each x <r, there are rational numbers larger than « that are also less than r (for 
example, («+r)/2is one such number). It also has the property that if« € L,, then 
so is any rational number less than «. It turns out that there are also subsets of Q 
with these three properties that are not of the form L, for some rational number. 
A subset of Q with these three properties is called a Dedekind cul. That is, 


Definition 1.4.1. A subset L of Q is called a Dedekind cul, or simply a cut in the 
rationals, if it satisfies the following three conditions: 


(a) Lf Oand L / Q; 
(i) L has no largest element; 
(c) if € L, then so is every y € Q with y < x. 


The reason for calling such a set La “cut” is that, if R is the complement of L, 
then each number in L is to the left of each number in R. Thus, the rational line 
is separated or cut into left and right halves. Since each half determines the other, 
we choose to focus on just the left half in this discussion. 


22 1. The Real Numbers 


Figure 1.4.1. A Dedekind Cut in the Rationals. 


Each rational number r determines a cut — the set L, of (1.4.1). In this case, 
r is called the cut number for the Dedekind cut. Are there Dedekind cuts that are 
not determined in this way? Cuts that have no rational cut number? 


Example 1.4.2. Describe a Dedekind cut that is not of the form L, for a rational 
number r. 


Solution: We are guided by the idea that there ought to be a number whose 
square is 2, but there is no such rational numiser. If there were a number V2 with 
square 2, then the set of rational numlers less than V2 could be descrilsed as 


L={reQ:r>0andr’ < 2}U{rEeQ:r <0}. 
We claim this is a Dedekind cut not of the form L, for any r € Q. 
Certainly L is a non-empty, proper subset of Q. It has no largest element 


eecause if 2 is any positive element of L, then we can always choose a larger 
rational number which still has square less than 2 as follows: 41 > ® for every 


keNand 
kn+1\?  7ny2. 1 fon 1 
( kan ) =(5) ea 5+). 


By choosing k large enough, we can make the second term on the right less than 
2— (2)? and this will imply that (£t+1)? < 2. Thus, L has no largest element. 


m km 

Ifa € Land y < a, then either y is negativ 
@<y <a. In the latter case, y? < 2? < 2, and so y € L in this case as well. Thus 
L is a Dedekind cut. 


in which case it is in L, or 


We next show that there is no rational number r such that L = L,. If there 
is such a number r, then r is a positive rational number not in L and so r? > 2. 
However, there are numbers in L arbitrarily close to r and each of them has square 
less than 2. It follows that r? < 2. This means r? = 2, which is impossible for a 
rational number r. 


Thus, although it might seem that every Dedekind cut ought to correspond to 
a cut number, the above example shows that this is not the case. In fact, there are 
a lot more cuts than there are rational cut numbers. However, we can fix this by 
enlarging the number system so that there is a cut number for every Dedekind cut. 
The way this is usually done is to define the new number system to actually be the 
set of all Dedekind cuts of the rationals. Below, we attempt to describe this idea 
in a way that is somewhat visually intuitiv 


We will think of a Dedekind cut L as specifying a certain location (the location 
between L and its complement R) along the rational number line. We will think 
of the real number system R as being the set of all such locations. Then each real 
number « corresponds to a Dedekind cut L,, which is to be thought of as the set of 


1.4, The Real Numbers 23 


all rational numbers to the left of the location x. We next need to define an erder 
relation and operations of addition and multiplication in R. 


x <yif Ly C Ly. Anelement r€R 
definition of order on R we can assert 


The order relation on R is simple: we s 
is, then, non-negative if Lo C Ly. With thi 
that 


L,={reQ:r<a} 
for all « € R (not just for x € Q). 


Addition of real numbers is defined as follows: if x,y € R, then we set 
Ly+Ly={r+s:r€ Lyz,s € Ly}. 


It is easily verified that this is also a Dedekind cut (Exercise 1.4.10) and, hence, it 
corresponds to an element of R. We define x + y to be this element. 


The product of two non-negative numbers « and y is defined as follows: we set 
K={rs:ré€L,,r>0,s€Ly,s>O0}Ufte Q:t < O}. 


This is a Dedekind cut (Exercise 1.4.11), and we define cy to be the corresponding 
element of R. For pairs of numisers where one or both is negative, the definition 
of product is more complicated due to the fact that multiplication by a negative 
number reverses order. 


Of course Q C R, since each rational number was already the cut number of a 
Dedekind cut. It is easily checked that the definitions of addition, multiplication, 
and order given above agree with the usual ones in the case that the numlsers are 
rational. 


The numbers in R that are not in Q are called irrational numbers. It turns 
out that there are many more irrational numbers than there are rational numbers. 
To make sense of this statement, requires a discussion of finite sets and infinite sets 
and how some infinite sets are larger than others. We present such a discussion in 
the appendix. 


The Completeness Axiom. This is the property of the real number system 
that distinguishes it from the rational number system. Without it, most of the 
theorems of calculus would not be true. 

A subset A of an ordered field F is said to be bounded above if there is an 
element m € F such that 2 < m for every x € A. The element m is called an upper 
bound for A. If, among all upper bounds for A, there is one which is smallest (less 
than all the others), then we say that A has a least upper bound. 


Definition 1.4.3. An ordered field F is said to be complete if it satisfies: 


C. Each non-empty subset of F which is bounded alsove has a least upper bound. 


If one defines the real numiser system R in terms of Dedekind cuts of the 
rationals and defines addition, multiplication, and order as above, then one can 
prove that the resulting system is an ordered field. To carry out all the details of 
this proof is a long and tedious process and it will not be done here. However, 
it is quite easy to prove that R, as defined in this way, satisfies the completeness 
axiom C. 


24 1. The Real Numbers 


Theorem 1.4.4. If R is defined using Dedekind cuts of Q, as above, then every 
non-emply subset of R which is bounded above has @ least upper bound. 


Proof. Let A be a bounded subset of R and let m be any upper bound for A. For 
each x € A, let L, be the corresponding cut in Q. Then  < m for all « € A means 
that Ly C Lm for all x € A. We set 


L=U te. 
recA 
Then L is a proper subset of Q because L C Im. Ifr € L and s <r, then r € Ly 
for some x € A and this implies s € L, and, hence, s € L. If L had a largest 
element ¢, then ¢ would belong to L, for some x, and it would have to be a largest 
element for L, — a contradiction. Thus, L has no largest element. We have now 
proved that L satisfies (a), (b), and (c) of Definition 1.4.1 and, hence, that L is a 
Dedekind cut. 


If y is the real number corresponding to L, that is, if L = Ly, then, for all 
x € A, Ly C Ly, and this means « < y. Thus, y is an upper bound for A. Also, 
Ly © Lm means that y < m. Since m was an arbitrary upper bound for A, this 
implies that y is the least upper bound for A. This completes the proof. a 


This completes our outline of the construction of the real number system begin- 
ning with Peano’s axioms for the natural numbers. The final result is the following 
theorem, which we will state without further proof. It will be the starting point for 
our development of calculus. 


Theorem 1.4.5. The real number system R is @ complete ordered field. 


Example 1.4.6. Find all upper bounds and the least upper bound for the following 
sets: 


A= (-1,2)={rER:-1<a< 2}; 
B= (0,3) = {mE R:0<2< 3}. 


Solution: The set of all upper bounds for the set A is {a € R: a > 2}. The 
smallest element of this set (the least upper bound of A) is 2. Note that 2 is not 
actually in the set A. 


The set of all upper bounds for B is the set {z € R: a > 3}. The smallest 
element of this set is 3 and so it is the least upper bound of B. Note that, in this 
case, the least upper bound is an element of the set B. 


If the least upper bound of a set A does belong to A, then it is called the 
maximum of A. Note that a non-empty set which is bounded above always has 
a least upper bound, by axiom C. However, the preceding example shows that it 
need not have a maximum. 


The Archimedean Property. An ordered field always contains a copy of the 
natural numbers and, hence, a copy of the integers (Exercise 1.4.5). Thus, the 
following definition makes sense. 


1.4, The Real Numbers 25 


Definition 1.4.7. An erdered field is said to have the Archimedean property if, 
for every x € R, there is a natural number n such that « <n. An erdered field 
with the Archimedean property is called an Archimedean ordered field. 


Theorem 1.4.8. The field of real numbers has the Archimedean property. 


Proof. We use the completeness property. Suppose there is an x such that n <a 
for all n € N. Then N is a non-empty subset of R which is bounded above. By the 
completeness property, there is a least upper bound b for N. Then b is an upper 
bound for N, but 6-1 is not. This implies there is an n € N such that b—1 <n. 
Then b < n+ 1, which contradicts the statement that b is an upper bound for 
N. Thus, the assumption that N is bounded above by some z € R has led to a 
contradiction. We conclude that every x in R is less than some natural number. 
This completes the proof. a 


The Archimedean property can be stated in any one of several equivalent ways. 
One of these is: for every real number a > 0, there is an n € N such that 1/n <a 
(Example 1.4.9). Another is: given real numbers « and y with « > 0, there is an 
né€N such that nx > y (Exercise 1.4.6). 


Example 1.4.9. Prove that, in an Archimedean field, for each x > 0 there is an 
n€N such that 1/n< a. 

Solution: The Archimedean property tells us that there is a natural number 
n> 1/x. Since n and x are positive, this inequaltiy is preserved when we multiply 
it by x and divide it by n. This yields 1/n < «, as required. 


Another consequence of the Archimedean property is that there is a rational 
number between each distinct pair of real numbers (Exercise 1.4.7). 


[SSS =a 
Exercise Set 1.4 


1. For each of the following sets, describe the set of all upper sounds for the set: 
(a) of odd integers; 
(b) {1= 1/n:neN}; 
(c) {rE Q:r3 < 8}; 
(d) {sina : a € R}. 

2. For each of the sets in (a), (b), (c) of the preceding exercise, find the least upper 
bound of the set, if it exists. 

3. Prove that if a subset A of R is bounded above, then the set of all upper bounds 
for A is a set of the form [1,00). What is x? 


4. Show that the set A = {a : 22 < 1—2} is bounded above, and then find its 
least upper bound. 


5. If F is an erdered field, prove that there is a sequence of elements {ng}xey, all 
different, such that n, = 1 (the identity element of F) and ngyy = ng + 1 for 
each k € N. Argue that the terms of this sequence form a subset of F which 
is a copy of the natural numbers, by showing that the correspondence k + ng 


26 1. The Real Numbers 


is a one-to-one function from N onto this subset. By definition it takes the 
successor k + 1 of an element k € N to the successor n, + 1 of its image nx. 


6. Let F be an ordered field. We consider N to be a subset of F as described in 
the preceding exercise. Prove that F is Archimedean if and only if, for each 
pair x,y € F with « > 0, there exists a natural numlser n such that nx > y. 


7. Prove that if « < y are two real numbers, then there is a rational number r 
with « <r<y. Hint: Use the result of Example 1.4.9. 


8. Prove that if x is irrational and r is a non-zero rational number, then z +r and. 
rz are also irrational. 

9. We know that y/2 is irrational. Use this fact and the previous exercise to prove 
that if r < s are rational numbers, then there is an irrational number x with 
r<zcs. 


The following exercises concern Dedekind cuts of the rationals and should be done 
using only properties of the rational numlser system and the definition of Dedekind 
cut. 


10. Show that if L, and L, are Dedekind cuts defining real numbers x and y, then 
L,+Ly={r+s:reéL, ands € Ly} 


is also a Dedekind cut (this is the Dedekind cut determining the sum « + y). 
11. If L, and Ly are Dedekind cuts determining positive real numbers x and y and. 
if we set 


K={rs:0<reéL, and0<se€Ly}U{tEe Q:t < 0}, 


then K is also a Dedekind cut (this is the Dedekind cut determining the product 
ay). 

12. If L is the Dedekind cut of Example 1.4.2 and L determines the real number « 
(so that L = L,), prove that L,2 = L. Thus, the real number corresponding 
to L has square 2. 


1.5. Sup and Inf 


The concept of least upper bound, which appears in the completeness axiom, will 
be extremely 
We first note that there is a companion concept for sets that are bounded lselow. 


important in this course. It will be examined in detail in this section. 


Greatest Lower Bound. We say a set A is bounded below if there is a number 
m such that m < «x for every « € A. The number m is called a lower bound for A. 
A greatest lower bound for A is a lower bound that is larger than any other lower 
bound. 


Theorem 1.5.1. Every non-empty subset of R that is bounded below has a greatest 
lower bound. 


1.5. Sup and Inf 27 


Proof. Suppose A is a non-empty subset of R which is bounded below . We must 
show that there is a lower wound for A which is greater than any other lower bound 
for A. If m is any lower bound for A, then Example 1.3.8(a) implies that —m is an 
upper bound for —A = {-#: « € A}. Since R is a complete ordered field, there is 
a least upper bound r for —A. Then 


-a<rforallee A and r<—m. 
Applying Example 1.3.8(a) yields that 
=r <aeforalleeéA and m<-r. 


Thus, —r is a lower bound for A and, since m was an arbitrary lower bound, the 
inequality m < —r implies that —r is the greatest lower bound. a 


The Extended Real Numbers. For many reasons, it is convenient to extend 
the real number system by adjoining two new points 00 and —oo. The resulting 
set is called the extended real number system. We declare that oo is greater than 
every other extended real number and —oo is less than every other extended real 
number. This makes the extended real number system an ordered set. We also 
define x + 00 to be oo if « is any extended real number other than —oo. Similarly, 
x — 00 = x + (00) is defined to be —oo if x is any extended real number other 
than oo. Of course, there is no reasonable way to make sense of 00 — oo. 


The introduction of the extended real number system is just a convenient no- 
tational convention. For example, it allows us to make the following definition. 


Sup and Inf. 


Definition 1.5.2. Let A be an arbitrary non-empy subset of R. We define the 
supremum of A, denoted sup A, to be the smallest extended real number M such 
that @ <M for every a@€ A. 

The infimum of A, denoted inf A, is the largest extended real number m such 
that m <a for alle € A. 


Note that, if A is bounded alsove, then sup A is the least upper bound of A. If 
Ais not bounded above, then the only extended real number M with « < M for 
all a € Ais 00, and so sup A = 00 in this case. Similarly, inf A is the greatest lower 
bound of A if A is bounded below and is —oo if A is not wounded below. Thus, 
sup A and inf A exist as extended real numbers for any non-empty set A, but they 
might not be finite. Also note that, even when they are finite real numbers, they 
may not actually belong to A, as Example 1.4.6 shows. 


Example 1.5.3. Find the sup and inf of the following sets: 
SAG ={eeR:-1<2<1}; 
00,5) = {mE R:2<5}; 


(1.5.2) D= 


=C 
(1.5.1) o-{ 
i 


28 1. The Real Numbers 


Solution: Clearly, inf A = —1 and sup A = 1. These are finite, sup A belongs 
to A, but inf A does not. 


Also, inf B = —oo and sup B = 5. In this case, the inf is not finite. The sup is 
finite but does not belong to B. 


2 
n n 
aa Pa ou the set C is unbounded, and so sup C = oo. Also, we have 
n+1<n? +n? = 2n?, and so 


Since 


1. 
2-n+1 
for alln € N. Thus, 1/2 is a lower bound for C. It is the greatest lower bound, 
2 
n 


since it actually belongs to C, due to the fact that 
n 
inf C = 1/2. 

Certainly 0 is a lower bound for the set D. It follows from the Archimedean 
property (see Example 1.4.9) that there is no z € F with « > 0 which is a lower 
bound for this set, and so 0 is the greatest lower bound. Thus, inf D = 0. Clearly, 
supD=1. 


5 when n = 1. Thus, 


If A is a set of numbers and sup A actually belongs to A, then it is called the 
maximum of A and is denoted max A. Similarly, if inf A belongs to A, then it is 
called the minimum of A and is denoted min A. 


The following theorem is really just a restatement of the definition of sup, but 
it may give some helpful insight. It says that sup A is the dividing point setween 
the numbers which are upper bounds for A (if there are any) and the numbers 
which are not upper bounds for A. A similar theorem holds for inf. Its formulation 
and proof are left to the exercises. 


Theorem 1.5.4. Let A be a non-empty subset of R and let x be an element of R. 
Then 


(a) sup A <x if and only ifa <x for every @€ A; 
(») x < sup A if and only if x <« for some e € A, 


Proof. (a) By definition « < a for every « € A if and only if « is an upper wound 
for A. If x is an upper bound for A, then A is bounded above. This implies its sup 
is its least upper bound, which is necessarily less than or equal to x. 

Conversely, if sup A < 2, then sup A is finite and is the least upper bound for 
A. Since sup A < a, x is also an upper ound for A. Thus, sup A < a if and only 
ifa <x for every ae A, 

(i) If 2 < sup A, then a is not an upper wound for A, which means that a <« 
for some @ € A. Conversely, if 2 < @ for some a € A, then « < sup A, since 


a <sup A. Thus, x < sup if and only if. c <« for some e € A. a 
Example 1.5.5. If A = ae ine€ nt}, find the set of all upper bounds for A. 
nm 
Solution: Long division yields 
dn-1_ 2 


Gr+3 3 


1.5. Sup and Inf 29 


Thus, 2/3 is an upper bound for A. If x < 2/3, then ¢ = 2/3 — « is positive, and 
the Archimedean property implies we can choose n large enough that 
(ed 
n+1 


<e. 
Then 
2 1 4n-1 
3 Intl 6rt+3 
for such an n, which means that x is not an upper bound for A. 
We conclude that 2/3 is the least upper bound for A — that is, sup A = 2/3. 
By the previous theorem, the set of all upper bounds for A is the interval [2/3, 00). 


nr< 


2 
Example 1.5.6. If A = { ine wh, find sup A and the set of all upper 
a 
bounds for A. 

Solution: Long division yields 


n? 


The ee ee 
Then the Archimedean property implies that there are no upper bounds for A, 
since, for every x € R, there is ann € N for which n— 1 is larger than x. Thus, the 
set of upper bounds for A is the empty set and sup A = 00. 


>n-1. 


Properties of Sup and Inf. The next theorem uses the following notation con- 
cerning subsets A and B of R: 


—-A={-a:a€ A}; 
A+B={a+b:a€ Abe B}; 
A-B={a-b:a€A,be B}. 

Theorem 1.5.7. Let A and B be non-empty subsets of R. Then 
(a) inf A < sup A; 

(ie) sup(—A) = —inf A and inf(—A) = —sup A; 

(c) sup(A + B) = sup A + sup B and inf(A + B) = inf A + inf B; 
(d) sup(A — B) = sup A — inf B; 

(e) if AC B, then sup A < sup B and inf B < inf A. 


Proof. We will prove (a), (b), and (c) and leave (d) and (e) to the exercises. 


(a) If A is non-empty, then there is an element e € A. Since inf A is a lower 
bound and sup A an upper bound for A, we have inf A < e < sup A. 

(#) A number « is a lower bound for the set A ( < @ for all a € A) if and 
only if —a is an upper bound for the set —A (—« < —a for alle € A). Thus, if 
L is the set of all lower bounds for A, then —L is the set of all upper bounds for 
—A. Furthermore, the largest member of L and the smallest member of —L are 
negatives of each other. That is, — inf A = sup(— A). This is the first equality in 
(b). If we apply this result with —A replacing A, we have —inf(—A) = sup A. If 
we multiply this by —1, we get the second equality in (b). 


30 1. The Real Numbers 


(c) Since « < sup A and b < sup B for all @ € A, b € B, we have 
a+b<supA+supB forall ae A, bE B. 


It follows that 
sup(A + B) < sup A+ sup B. 


Let x be any number less than sup A+-sup B. We claim that there are elements 
«a € A and b€ B such that 
(1.5.3) r<atb. 

Once proved, this will imply that no number less than sup A + sup B is an upper 
wound for A+B. Thus, proving this claim will establish that sup(A + B) = 
sup A + sup B. 

There are two cases to consider: sup B finite and supB = co. If sup B is 
finite, then « — sup B < sup A, and Theorem 1.5.4 implies there is an a € A with 
x—supB <a. Then c—« <supB. Applying Theorem 1.5.4 again, we conclude 
there is a b € B with « —a < b. This implies (1.5.3) and proves our claim in the 
case where sup B is finite. 

Now suppose sup B = oo. Let « be any element of A. Then «—a < sup B = 00 
and so, as above, we conclude from Theorem 1.5.4 that there is a b € B satisfying 
a—a <b. This implies (1.5.3), which establishes our claim in this case and completes 
the proof. o 


Sup and Inf for Functions. If f is a real-valued function defined on some set 
X and if A is a subset of X, then 
L(A) = {f (a): © € A} 
is a set of real numbers, and so we can take its sup and inf. 
Definition 1.5.8. If f : X — R is a function and A C X, then we set 
sup f = sup{f(x):a€ A} and inf f = inf{ f(x): 2 € A}. 


Thus, sup, f is the supremum of the set of values that f assumes on A and 
inf 4 f is the infimum of this set. They themselves may er may not be values that 
f assumes on A. If sup4 f is a value that f assumes on A, then it is called the 
maximum of f on A, Similarly, if inf, f is a value assumed wy f somewhere on A, 
then it is called the minimum of f on A. 


Example 1.5.9. Find sup, f and inf; f if 
(a) f(x) = sina and I = |—71/2, 7/2); 
(e) f(x) = 1/x and I = (0,00). 
Solution: (a) The function sin x takes on all values in the interval [—1, 1) on 


T but does not take on the value 1. Thus, inf; f = —1 and sup; f = 1. In this case, 
inf, f is a value assumed by f on J, but sup, f is not. 

() The function 1/x takes on all values in the open interval (0,00). Thus, 
inf; f = 0 and sup; f = 00 in this case. Neither one of these extended real numbers 
is a value taken on by f on I. 


15. Sup and Inf 31 


The following theorem concerning sup and inf for functions follows easily from 
Theorem 1.5.7. We leave the details to the exercises. 


Theorem 1.5.10. Let f and q be functions defined on @ set containing A as a 
subset, and let c € R be @ positive constant. Then 
(a 
(e 
(c 
(a 


sup4cf =csupyf andinfacf =cinfyf; 
sup,(—f) = —infa f; 

supy(f +9) <supy f +sup4g end inf, f +infag < infa(f +49); 
sup{ f(x) — f(y): 2,y € A} = sup, f —infa f 


ET 
Exercise Set 1.5 


1. For each of the following sets, find the set of all extended real numbers x that 
are greater than er equal to every element of the set. Then find the sup of the 
set. Does the set have a maximum? 

(a) (—10, 10). 
(b) {r2 nm EN}. 
Qn+1 
) { \ 
n+1 
2. Find the sup and inf of the following sets. Tell whether each set has a maximum 


or a minimum. 
(a) (-2, ae 
n+ 
bree 
(») {x + i} 
(c) {n/m:n,m eZ, n? < 5m}. 
3. Prove that if sup A < 00, then for each n € N there is an element a, € A such 


that sup A= 1/n < an < sup A. 


4. Prove that if sup A = 00, then for each n € N there is an element a, € A such 
that an >n. 


. Formulate and prove the analog of Theorem 1.5.4 for inf. 
. Prove part (d) of Theorem 1.5.7. 
. Prove part (e) of Theorem 1.5.7. 


wornan 


. If A and B are two non-empty sets of real numbers, then prove that 
sup(AU B) = max{sup A, sup B} and inf(AU B) = min{inf A, inf B}. 


9. Find sup; f and inf; f for the following functions f and sets [. Which of these 
is actually the maximum or the minimum of the function f on [? 
(a) 1G) ae Des [1,1]. 
(b) f(a) = (1,2). 
(c) f(x) ae T= (0,1). 
10. Prove (a) of Theorem 1.5.10. 


1. The Real Numbers 


. Prove (b) of Theorem 1.5.10. 
. Prove (c) of Theorem 1.5.10. 
. Prove (d) of Theorem 1.5.10. 


— sy 
Chapter 2 


Sequences 


In this chapter we have our first encounter with the concept of limit — the concept 
that lies at the heart of the calculus. We first study limits of sequences of real 
numbers. Limits of functions will be studied in the next chapter. 


2.1. Limits of Sequences 


Limits make sense in any context in which we have a notion of distance between 
objects. Thus, we begin with a discussion of the notion of distance between two 
real numbers. 


Distance and Absolute Value. Recall that the absolute value |x| of a number 


x is defined by 
x if «>0, 

le] = * 
-c if «<0. 


Thus, |2| is always a non-negative number. It can be thought of as the distance 
from «x to 0. For example, 


[| =|-3|=3 
just means that the distance from 3 to 0 and the distance from —3 to 0 are the 
same, namely 3. Mere generally, if « and y are any two real numbers, the distance 
from « to y is |x — yl. 
We will often need to specify that a number « is close to another numieer «. 
However, this doesn’t mean anything unless we specify how close. If ¢ is a positive 


number, then the statement “x is within € of #” does have meaning. It means that 
the distance between x and a is less than ¢ — that is, 


|jz-al<e. 


This statement also means that « is in the open interval of radius ¢, centered at «, 
as pointed out in part (b) of the following theorem. 


34 2. Sequences 


Theorem 2.1.1. If z,y,a, and are real numbers with « > 0, then 


(a) |y| <€ af and only if -e<y< 6 
() |x —al <e if and only ifa-—e<ar<ate. 


These statements remain true if “<” is replaced by “ 


Proof. To prove (a), we consider two cases: 


(1) Suppose y > 0. Then |y| = y, and so [yl < € if and only if y < ¢. The latter 
statement means the same as —€ < y < €, because —¢ < y is automatically 
true in this case. 


is 


(2) Suppose y < 0. Then |y| = —y, and so |y| < e if and only if —y < e. Thi 
true if and only if -e < y, which is true if and only if —e < y < ¢, because 
y <c is automatically true in this case 


Part (b) follows from part (a). That is, if we apply part (a) with y = c-., 
then we conclude that |x — al <« if and only if —e <—a <¢, and this is true if 
and only ife-e<a<ate 


If “<” is replaced by “<”, the proofs of (a) and (b) remain the same. a 


The following theorem will be used extensively throughout the text. 
Theorem 2.1.2 (Triangle Inequality). [f « and b are real numbers, then 
(a) Ja + 8 < Jel + [0] and 
(ie) [le] = [bl] < Je — b. 


Proof. For part (a), we observe that —|a| < a < |a| and —|b| <b < |b]. If we add 
these inequalities, the result is 
—(al + [b]) Sat b< fal + ld]. 

By the preceding theorem (with “<” replaced by “<”), this is equivalent to Ja+)| < 
|e| + |b]. This proves part (a). 

For part (b), we note that part (a) implies |a| = |b + (a —6)| < |b| + |e — | and 
this yields 
(2.1.1) |a| |b] < Ja —6| 
when we subtract |b] from both sides. If we interchange b and @, then the right 
side of this inequality stays the same and the left side eecomes |b| — |e|. Thus, the 
inequality 

|b] — Ja] < |b] + Ja — d| 

also holds. This and (2.1.1) together imply part (b). a 


Sequences. A sequence of real numbers is a function from the natural numbers 
to the real numbers. That is, it is an assignment of a real number a, to each natural 
number n. Traditionally, we use the notation 


{an}%1 or simply {an} 


2.1, Limits of Sequences 35 


to denote a sequence, rather than using standard function notation. Alternatively, 
we may describe a sequence by writing out its first few terms and possibly its nth 
term: 


1,9,0g,-.. OF @1,49,43,...,Qny.--- 
Example 2.1.3. Write each of the following sequences in the form 


1,42, 43,-.-,Qn,--. 3 
(a) the sequence {(—1)"1/n}; 
(i) the sequence of positive even integers; 


antl 


(c) the sequence defined inductively by a1 = 2 and ans1 = 


Solution: The answers are 
(a) —1,1/2,-1/3,...,(—1)"1/n, 
() 2,4,6,...,2n,...5 
(c) 2,3/2,5/4,...,1+1/2"-1,.... 


The first two are obvious. For (c), we prove that an = 1+ 1/2"! by induction. 
This is certainly true for n = 1. If it is true for an integer n, then a, = 1+ 1/2"! 


and so 
enti = (an + 1)/2 = (141/27! +.1)/2=14 1/2". 
Thus, our formula for en is true for n +1 if it is true for n. By induction, it is true 


for all natural numbers. 


It is sometimes convenient to begin the indexing of a sequence at some integer 
k other than 1. For example, the sequence 


1:88.22,9% 


has description n + 2”~! as a function from the natural numbers to the real 


numbers, or, using standard sequence notation, {2"~!}2°.,, but it is usually more 
convenient to think of it as the function n — 2" from the non-negative integers to 
the reals and to denote it {2"}%¢. Similarly, the sequence 


8/3, 4, 32/5, 32/3, 128/7,... 


gnt2 
can bee described as the sequence =f + but it may ke more convenient to 
nt2 yn 
ny co 
describe it as 4} . Passing from one notation to the other is a change of 
n=3 


variables in the index — that is, n is replaced by n — 2 and the starting point for 
the sequence is changed from n = 1 to n = 3 (since n — 2 is 1 when nis 3). 


Limits of Sequences. A sequence {an} converges to a number e@ if the distance 
from an to « can be made less than any given positive number by insisting that n 


36 2. Sequences 


be sufficiently large. Mote precisely: 
Definition 2.1.4. A sequence {an} of real numbers is said to converge to the 
number a, or have limit equal to «, if, for each ¢ > @, there is a real number N such 
that 

Jan —a|<e whenever n> N. 


In this case, we will write lim an =« or liman =a or simply an > a. 
n00 


Remark 2.1.5. If we compare what would be required by the above definition for 
then we find that the 


requirements are identical. Thus, an + @ if and only if Jan — | > @. 


limen =e and what would be required for lim an — «| = 


The limit of a sequence (if it exists) is well defined — that is, a sequence cannot 


have more than one limit. 


Theorem 2.1.6. If an — « anda, — b, thena = b. 


Proof. If a, — @ and a, — b, then, for each ¢ > @ there are numbers N; and N2 
such that 

n>, implies lan —el <¢/2 and 

n> Nz implies lan —5| < 6/2. 
If n is an integer larger than both N; and No, then 

|b— a] = |(an—a) + (b—an)| < Jan — 0] + |b—«,| <€/2+€/2=€. 

This implies that |b— e| is smaller than every positive number €. Since |b— al > @, 
this is possible only if |b— al = @ — that is, only if = b. (In this argument we used 
aniahportank property GF the-reall numbeleystent without somunentiadn:Bxstcies 
2.1.12 you are asked to figure out what property that is.) o 


Finding the limit of a sequence often involves two steps: (1) make a good 
intuitive guess as to what the limit should be and (2) prove that your guess is 
correct by using the above definition er theorems that have been proved using it. 
The following example illustrates the first of these steps. 


Example 2.1.7. Make an educated guess as to what the limits are for the following 
sequences: 


(a) {1/n}; 


 {aerp 
(c 


) {(-)"}s 
(d) {4+ 1/n}. 
Solution: (a) The larger n becomes, the smaller 1/n becomes. Thus, it appears 
that lim 1/n = @ 


() If we divide the numerator and denominator of 5 ” by n, the result is 
1 

2+ 1/n- 

choose 1/2 as our guess. 


If 1/n — @, then it should be the case that — 1/2. Thus, we 


Fifa 


2.1. Limits of Sequences 37 


(c) Since the sequence {(—1)"} alternates between —1 and 1, it does not appear 
to converge to any one number. Thus, we guess that it does not converge. 

(a) If 1/n + 0, then it should be the case that 4+ 1/n + V4 = 2. Thus, 
our guess is 2. 

Example 2.1.8. Use the definition of limit to verify that the guesses in the pre- 
ceding example are correct: 

Solution: (a) Given ¢ > 0, we must show that there is an N such that n > N 
implies 1/n < ¢. However, since 1/n < ¢ if and only if n > 1/e, if we choose 
N = 1/e, then, indeed, n > N implies 1/n < 

(i) Given € > 0, we must show that there is an N such that 


n 
>N lies =}|——— — 1/2] <e. 
Some work with the expression in absolute values shows us how to do this: 
HE Sig st Dito 1 a eed 
Qn+1 dn +2 4n+2 > dn 
n 1 , 1 : 
Thus, — 1/2| < © whenever — < ¢ — that is, whenever n > —. Thus, it 
Qn+1 An de 


suffices to choose N = + 
de 

(c) We will show that there is no numler « which satisfies the definition of 
the statement lim(—1)" =e. Let @ be any real numlser and choose ¢ = 1/2. If 


lim(—1)" =a, then there must be an N such that 
n>N implies |(-1)"—«| < 1/2. 

Since there are both even and odd integers n > N, this means that 
|l-el<1/2 and |-1-e| <1/2. 

Then the triangle inequality (Theorem 2.1.2(a)) implies 
2Q=|L-a+1+al<|l—el+|l+el=|l—e/+|—1-e| <1/2+1/2=1. 
Since it is not true that 2 < 1, our assumption that lim(—1)" = « must be false. 
Since this is the case no matter what real number we choose for a, we conclude that 
{(=1)"} has no limit. (Once again, as in the proof of Theorem 2.1.6, we used here, 
without comment, a special property of the real number system. Exercise 2.1.12 

asks you to state what property that is.) 
(a) Given € > 0, we must show there is an N such that 
n>N implies |\/T+1/n-2|<e. 
We simplify this problem by rationalizing the positive expression \/1 + 1/n — 2: 
a 44 1/n—-2)(/44 1/n+2 
|Vi4 Ifa -2| = Een a — WEE 20/4 4 Tn +2) 
V4ti1/n+2 
_ Atin-4 2 Vn 1 
V4 i/n+2 V442 dn’ 


Thus, if N = 1/(4e), then n > N implies |\/4-41/n — 2| < €. 


(2.1.2) 


38 2. Sequences 


at 
Exercise Set 2.1 


1, Show that 
(a) if |x —5| <1, then is a number greater than 4 and less than 6; 
(b) if |x — 3] < 1/2 and |y — 3] < 1/2, then |x — y| <1; 
(c) if |e — e| < 1/2 and |y — | < 1/2, then Ja +y—(e+)| <1. 

2. Use the triangle inequality to prove that there is no number « which satisfies 
both |x — 1| < 1/2 and |x — 2| < 1/2. 

3. Put each of the following sequences in the form @1,@2,€3, This 
requires that you compute the first 3 terms and find an expression for the nth 
term. 

(a) The sequence of positive odd integers. 


(b) The sequence defined inductively by #1 = 1 and enyi = 


an 
n+l 


(c) The sequence defined inductively by e1 = 1 and eny1 = 


In each of the next five exercises, first make an educated guess as to what you think 
the limit is. Then use the definition of limit to prove that your guess is correct. 


4. lim 1/n?. 


. In-1 
8. lim 
6. lim(-1)"/n. 
8 n 
7. lim So 
8. lim{/n + 1 — yr}. 
9. Prove that lim(1/n + (—1)"/n?) = 0. 


. Prove that lim2~” = 0. Hint: Prove first that 2" > n for all natural numbers 
n. 


11. Prove that if en, + 0 and k is any constant, then kan — 0. 


12. In the proof of Theorem 2.1.6 we failed to point out that one step is true only 
because we are working in the real number system and not some other ordered 
field. What special property of the real number system makes this argument 
work? This same property is also used without comment in Example 2.1.8, 
solution, part (c). 


2.2. Using the Defi 


in of Limit 


It is important that mathematics students become comfortable with the notion 
of limit of a sequence. Unfortunately, it is a difficult concept to grasp. Students 
almost always have difficulty with it at first and learn to understand it only through 
repeated exposure and extensive practice in its use. This section is designed to 
provide some of this practice. 


2.2. Using the Definition of Limit 39 


Using Identities and Inequalities. In each of the following examples, we wish 
to show that a certain sequence {en} has limit e. The strategy for doing this, in 
each case, is to use identities and inequalities on the expression ja, — «| until we 
can show that it is less than er equal to some much simpler expression in n that 
can clearly be made less than any given € by choosing n large enough. 

n 


Example 2.2.1. Prove that lim 
In = 3 


1/2. 


Solution: We have 
n 


2n-3 


1/2| = 


nm —2In+3 
4n—6 


3 
n—6 
Now dn— 6 =n + (3n— 6) > n whenever n> 1. Thus, 


< 2 ae. 
~4n-67 n 


n 


2n-3 12 


provided n> 1. Given ¢ > 0, if we choose N = max{1,3/e}, then 


3 
pees in| <= <e whenever n>N. 
n 


2n-3 


ee 
This completes the proof that lim 5 
n= 


po M2 


Example 2.2.2. Prove that lim(2+ 1/n)? = 4. 
Solution: We have 
d4ifn 25 


\(2+ 1/n)? — 4 = |2 4+ 1/n + 2||2+1/n— 2| = 
n n 


Thus, given € > 0, if we set N = 5/e, we have 
; 5 
\(2+ 1/n)? -4| <= <e whenever n>N. 
n 
This proves that lim(2 + 1/n)? =4. 


Using Information About a Limit. Knowing that a sequence converges er that 
it converges to a specific number always provides a great deal of other information. 
We give some examples below. 


Theorem 2.2.3. [flimen =« anda <c, then there exists an N such that 
an<c forall n>N. 
Similarly, if b <a, then there is an N such that 


b<an forall n>N. 


Proof. If a < c, then c—a > 0. Since lima, = a, for each € > 0, there is an N 
such that 


lan —a|<e whenever n> N. 


If we use this in the case where € = c— a, it tells us there is an N such that 


Jan —a|<c—a@ whenever n>N. 


40 2. Sequences 


This implies 
a@—cta<a,<atc—a whenever n>N, 


by Theorem 2.1.1(). Thus, an <c¢ for alln > N. 


The second statement of the theorem is proved in the same way. a 


A sequence {a,} is bounded above (or below) if the set of numbers which appear 
as terms of {en} is bounded above (er below) as aset of numbers. A sequence which 
is bounded above and bounded below is simply said to be bounded. 

The following corollary follows directly from the preceding theorem. We leave 
the details to the exercises. 


Corollary 2.2.4. If @ sequence {an} converges, then it is bounded. 


Theorem 2.2.5. If {a,} is @ sequence and lima, =e, then lim |a,| = 


Proof. We use the second form of the triangle inequality (Theorem 2.1.2(b)) to 
write 


(2.2.1) Ilan — lal] < Jan — a]. 
Since lima, = a, given € > 0, there is an N such that 
lan —@| <€ whenever n> N. 


Then, by (2.2.1), it is also true that 


llen| — |all <¢ whenever n> N. 
Thus, lim |en| = fel. o 
Example 2.2.6. For a sequence {a} with lima, = @, prove lime? = a7. 
Solution: We first note that 
(2.2.2) Jaz — a?| = a + an|lan — a] < (Jan| + lal) Jan — al. 
We know that lim |an| = e| by the previous theorem. Since |e| < |a| +1, Theorem 


2.2.3 implies that there is an Nj such that |a,| < |e| +1 for all n > Ny. This and 
(2.2.2) together imply that 


Jaz — 02] < (Qle| + 1)la, —@| whenever n> M. 


Given ¢ > 0, we choose Np such that Ja, —al < whenever n > No. We can 


€ 

Qjel +1 

do this because limen = «. If we set N = max(Nj, No), then 
jaz —a?|<e whenever n> N. 


Hence, lima? = «2. 


2.2. Using the Definition of Limit 41 


An Equivalent Definition of Li The following theorem rephrases the defi- 
nition of limit in a way that may provide some additional insight. 


Theorem 2.2.7. A sequence {a,} converges to « if and only if, for each « > 0, 
there are only finitely many n for which lan — «| 2 €. 


Proof. Given ¢ > 0, set 
A, = {n€N: |a, —«| > e}. 


If lime, = «and € > 0, there is an N such that Ja, —al <¢ whenever n > N. This 
means that A, is contained in the set {1,2,...,N} and, hence, is finite. 


Conversely, suppose that, for each ¢ > 0, the set A, is finite. Then given 
> 0, the set A, has a largest element N. This means n ¢ A, ifn > N — that is, 
Jan —a| <eifn> N. This implies that lime, = «. a 


Negating the Limit Definition. What does it mean for it not to be true that 
lima, = «? That is, what is the negation of the statement “for each € > 0 there is 
an N such that |e, — «| < ¢ whenever n > N”? If it is not true that for each € > 0, 
there is an N such that ..., then for some € > 0, there is no N such that .... If we 
fill in the dots, we get the following statement: 


The sequence {en} does not converge to if and only if for some € > 0 there is 
no N such that Jan —a| <¢ for alln > N 


We may rephrase the second half of this statement to olstain: 


The sequence {e,} does not converge to e if and only if for some ¢ > 0 and for 
every N there is an n > N such that |a, — el >e. 


Negating the equivalent definition of limit given in Theorem 2.2.7 yields a 
somewhat simpler statement: 


The sequence {a,} does not converge to « if and only if for some € > 0 there 


are infinitely many n € N for which Jan — el >e. 


Example 2.2.8. Show that the sequence {2~" + (1+(—1)")2~°*} does not converge 
to 0. 

Solution: Try computing a few terms of this sequence on a calculator. It 
appears to be converging to 0. However, if we choose « = 2~*, then for every even 
neN 

j2-" + (1+ (-1)")2- 8 - 0] = 2" 42-2°% D2. 
Since this inequality holds for infinitely many n, the sequence does not converge 
to 0. 


12 2. Sequences 


eT 
Exercise Set 2.2 


In each of the following six exercises, first make an educated guess as to what you 
think the limit is. Then use the definition of limit to prove that your guess is 
correct. 


Bn? 2 
1. lim 
eal 
fy on 
2 lim Sy 


n 2 
1. lim : 
n+1 
. lim(/n? +n = nr). 
. lim(1 + 1/n)3. 


. Prove Corollary 


4. 
3 


. Prove that if lime, =e, then lime? = a3. 


5. 
6. 
tf 
8. 
9. 


. Does the sequence {cos (n7/3)} have a limit? Justify your answer. 


. Give an example of a sequence {e,} which does not converge but for which the 
sequence {|e,|} does converge. 


11. Prove that if {e,} and {b,} are sequences with |an| < by for all n and if 
limb, = 0, then lime, = @ also. 


12. Prove the following partial converse to Theorem 2.2.3: Suppose {an} is a con- 
vergent sequence. If there is an N such that a, < c for all n > N 
such that b < e, for all n > 


then 


then 


lime, < c. Also, if there is an } 
b<limay. 


13. Use the result of the preceding exercise to prove that an interval I is closed if 
and only if each sequence in J that converges actually converges to a point of I. 


14. Prove Corollary 2.2.4. That is, prove that a convergent sequence is sounded. 


15. For a certain sequence {a,} there is an ¢ > 0 such that every millionth term 
of the sequence {@,,} is greater than ¢. Can such a sequence converge to @? 
Justify your answer. 


2.3. Limit Theorems 


We reiterate that the strategy to use in proving a statement of the form 
lime, =«@ 


directly from the definition is to use a string of identities and inequalities to conclude 
that Je, —e| is less than or equal to a simpler expression in n that we can easily force 
to be less than ¢ by making n sufficiently large. This strategy was used throughout 


2.3. Limit Theorems 43 


the previous two sections. The following theorem formalizes this strategy in a way 
that will lead us to use the right approach to many limit, proofs. 


Theorem 2.3.1. Let {an} and {bn} be sequences of real numbers and suppose 
lim bn = 0. Ifa € R and if there is an Ny such that 


(2.3.1) Jan —a|<by forall n>, 


then liman = @. 


Proof. Since limb, = 0, given any ¢ > 0, there is an Np such that 
bn = [bn —0|<e€ whenever n> Np. 
It now follows from (2.3.1) that 
Jan —a|<e€ whenever n> N = max{Nj, No}. 
Thus, liman =a. a 
Of course, to prove that liman = @ using this theorem one must establish an 
inequality of the form (2.3.1), where {b,,} is a sequence of non-negative terms that 


we know converges to 0. The proof of the next theorem uses this technique. The 
proof is easy and is left to the exerc 


A sequence {b,,} for which there is a numler & such that b, < k for all n is 
said to be bounded above. If there is a number m such that m <b, for all n, then 
the sequence is said to be bounded below. A sequence which is bounded above and 
below is simply said to we bounded. Note that a sequence {b,,} is bounded if and 
only if {|b,|} is bounded above (Exercise 2.3.6). Recall from Corollary 2.2.4 that 
convergent sequences are bounded. 


Theorem 2.3.2. Let {an} be a sequence of real numbers such that lima, = 0, and 
let {bn} be @ bounded sequence. Then limenbn = 0. 


The following theorem is often called the squeeze principle. 


Theorem 2.3.3. If {an}, {bn}, and {cy} are sequences for which there is a number 
KC such that 


bn San<en forall n>K 


and if by, +a and c, > a, then an — «. 


Proof. Since by + @ and cy > @, given € > 0, there are numbers Ny and Np such 
that 


-e<by <ate forall n> N, and 


pee wt 
(2.3.2) a@-—e<en<ate forall n>No. 
Then for n > N = max{Nj, No, K} we have 


a-—e<bnSanScm<ate 


This implies Jan — e| <¢. Thus, liman =a. a 


44 2. Sequences 


Example 2.3.4. Prove that if {an} is a sequence of positive numbers converging 
to a positive number a, then lim /a, = Va. 


Solution: We will use Theorem 2.3.1. Rationalizing the numerator gives us 
~ Jan — a 1 
— Jal = 2 < Glen -al. 
Ivan Val = EOS < Tolan al 
Since an > a, Remark 2.1.5 implies |a, — «| @. Then Theorems 2.3.2 and 2.3.1 
imply Yar + Va. 
Example 2.3.5. Prove that if |a| < 1, then lima” = 0. 
Solution: The result is trivial in the case a = @. Ifa / @, we set b= |a|- 1-1. 
Then b > 0 and |a|~! = 1+ 6. We use the Binomial Theorem (Theorem 1.2.12) to 
expand |a|~" = (1+ 6)": 


u( 


n(n—1) 9 
pe does 


(1+ 6)” =14nb+ 


Since all the terms involved are positive, it follows that Jal-" = (1+ 6)" > nb. 
Inverting this yields 


Since 1/n — @, it follows from Theorems 2.3.2 and 2.3.1 that a” = @. 


The Main Limit Theorem. This is the theorem that tells us that the limit 
concept ehaves well with regard to the usual algebraic operations. 


Theorem 2.3.6. Suppose an--+ @, by + b, c is a real number, and k is « natural 
number. Then 


. Can > ca; 
(ib) an + bn + at; 

(c) anbn > ab; 

(a) ue ot ve if b £@ andb, £@ for all n; 
) 
) 


(c) ak - 
( 


1/k 


f) an Pai if an > @ for all n. 


Proof. Part (a) follows immediately from Theorem 2.3.2 applied to the sequence 
{clan — @)}. We will prove (c) and (e) and leave (b), (a), and (f) to the exercises. 

(c) We use the strategy suggested by Theorem 2.3.1. We have 

Jandy — 0b] = Jandy — abn + abn — ab] < lan — @l|bn| + lal[bn — dl, 

by the triangle inequality. Furthermore, we have that {bn} is bounded by Corollary 
2.2.4, and so {|bn|} is bounded above. We also have |an —e| + @, by Remark 2.1.5. 
Therefore, by Theorem 2.3.2, |an — @||bn| + @ By part (a), |al|b, — b| + @. By 
part (b) the sum Jan —«||b;,| + |a||b, — 6| converges to @ and, hence, anb, + ab by 
Theorem 2.3.1 

(e) We use the identity 


kal 


ak — ak = (an — a(t 4 ak Pat ak 8a? 4. tah!) = (an —a)bn, 


2.3. Limit Theorems 45 


where 
bn = ak) 4 aka 4 ak 3a? 4.» pahoh 
Now, because the sequence {a} converges, it is bounded and, hence, {|an|} is 
bounded above. We choose an upper bound m for {|an|} which also satisfies |al < m. 
Then 
|bn| < km*. 
Since k and m are fixed, the sequence {|b,|} is bounded above. 


We conclude from Theorem 2.3.2 that en —e||bn| + 0 and from Theorem 2.3.1 


that ak + a*, Oo 
2 
Bate . . +3n+1 
Example 2.3.7. Use the Main Limit Theorem to find lim So 
3n2 — Tn +2 


Solution: In a problem of this type, we divide the numerator and denominator 
ey the highest power of n that appears in either one. In this case, that is the second 
power. The result is 

14 3/n+ 1/n? 
3-7/n+2/n?" 
The Main Limit Theorem then tells us that 
143/n+1/n? — lim(1+ 3(1/n) + 2(1/n)? 
3—7/n+2/r? — lim(3— 7(1/n) + 2(1/n)? 
1+ 3lim(1/n) + 2lim(1/n)? — 14 3lim(1/n 
~ 3—Tlim(i/n) + 2lm(i/n? — 3—7lim(i/n 
14+3-0+ 2(0)? 
~ 3-7-0420) 
Here, we didn’t explicitly refer to the parts of the Main Limit Theorem as we used 
them, but it is clear that the first equality uses (d), the second (a) and (b), the 
third (e), and the fourth the fact that lim 1/n = 0 (Example 2.1.8). 


lim 


+ 2(lim 1/n)? 


(2.3.3) + 2(lim 1/n)? 


1/3. 


Theorem 2.3.8. [f {a,,} and {b,} are convergent sequences converging to a and b, 
respectively, and if there is « number K such thet ay < by whenever n> K, then 
acd. 


Proof. The sequence cy = by — @n is a sequence with b— @ as its limit and with 
terms that are non-negative for n > K. If b—@ were negative, then Theorem 2.2.3 
would imply by, — @, < 0 for all sufficiently large n. Since this is not the case, we 
conclude that a <b. a 


Oa 
Exercise Set 2.3 


3m 
1. Use the Main Limit Theorem to find lim =" —" a 
0 


+n? +6" 
ee tee ; n—5 
2. Use the Main Limit Theorem to find lim —— >. 
nF + In? +5 
gn 


3. Use the Main Limit Theorem to find lim <-—. 
FI 


16 2. Sequences 


sinn 
4. Prove that lim 


n 
. Prove Theorem 2.3.2. 


oa 


6. Prove that a sequence {a,} is both bounded above and bounded below if and 
only if its sequence of absolute values {lan|} is wounded above. 

7. Prove part (b) of Theorem 2.3.6. 

8. Prove that if {bn} is a sequence of positive terms and b, + b > @, then there 
is a number m > @ such that b, > m for all n. 


9. Prove part (d) of Theorem 2.3.6. Hint: Use the previous exercise. 
. Prove part (f) of Theorem 2.3.6. Hint: Use the identity 


(w@ —y)(e8“3 + why ph yk) 


with a = al/* and y= al/‘, 


11. For each natural number n, let by = n!/" = 1. Then by is positive and n = 
(1+ bn)". Use the Binomial Theorem (Theorem 1.2.12) to prove that n > 


ne l);, 2 
n(n = Vio and, hence, that b, < 
Vin 


12. Prove that limn!/” = 1. Hint: Use the result of the previous exercise. 


13. Prove that if > @, then lime!/" = 1, Hint: Do this first for @ > 1; use the 
result of the previous exercise and the squeeze principle. 


2.4. Monotone Sequences 


A sequence of real numbers {en} is said to be non-decreasing if an < @n41 for each 
n. The sequence is said to be non-increasing if an > @n,1 for each n. If it is one 
or the other (either non-decreasing or non-increasing), the sequence is said to be 
monotone. 


Convergence of Monotone Sequences. In this section and the next, we will 
develop powerful tools for proving that a sequence converges. These tools work even 
in situations where we have no idea what the limit might be. It is the completeness 
axiom for the real number system that makes these results possible. 


Theorem 2.4.1 (Monotone Convergence Theorem). Each bounded monotone 
sequence converges. 


Proof. A non-decreasing sequence {en} is bounded if and only if it is wounded 
alsove, since it is automatically bounded below by e;. Similarly, a non-increasing 
sequence is bounded if and only if it is bounded below. 


We will prove that every non-decreasing sequence that is bounded above con- 
ing sequence that is bounded below con- 
verges is the same but with all the inequalities reversed. 


verges. The proof that every non-incre 


Thus, suppose {en} is non-decreasing and bounded alsove. Then the set 


A={an:n€N} 


2.4. Monotone Sequences AT 


is a non-empty set which is bounded above. By the completeness axiom C, this set 
has a least upper bound . That is, 


supen = supA=a 
n 


is finite. We will show that « is the limit of the sequence {an}. 


Given ¢ > @, the number «—¢ is less than @ and so it is not an upper bound for 
A. This means there is some natural number N such that a—¢ < ey. Ifn> N, 
then ay < an since {an} is a non-decreasing sequence. This implies a —¢ < an. 
We also have an <a < «+6, since a is an upper bound for {en}. Combining these 
inequalities yields 

a—e<an<ate forall n>N. 
By Theorem 2.1.1(b), this is equivalent to 


lene] <e forall n>N. 


We conclude that liman = a. Oo 


Example 2.4.2. Let a sequence be defined inductively by «; = @ and 
a,+1 
(2.4.1) Ons = 


Prove that this sequence converges and find its limit. 


Solution: This is a non-decreasing sequence (Exercise 1.2.13). Also, a simple 
induction argument, shows that it is wounded alove by 1. Therefore it is a wounded 
monotone sequence, and it converges by the previous theorem. Let limen =. If 
we take the limit of both sides of (2.4.1), the result is a = (a+ 1)/2, or a/2 = 1/2. 
Thus, «= 1. 


A less trivial example is the following: 


Example 2.4.3. Let a sequence {an} be defined inductively by a; = 2 and 
42 
2a. 


2an 


(2.4.2) Ont = 


Prove that this sequence converges and then find its limit. 


Solution: We first note that a trivial induction argument shows that an > @ 
for all n. This is true when n = 1 and it is true for n+ 1 whenever it is true for n 
by (2-4.2). 
We will prove that the sequence is non-increasing. To show that an > @ni1; 
3 
a2 +2 


we must show that an > . If we assume that a, > @, then we may multiply 


an 
this inequality by 2en to obtain the equivalent inequality 
2a >a +2 or a >2. 


We conclude that an > @n41 as long as @n is positive and a2 > 2— that is, as long 
as an > V2. Now a; = 2 and so the sequence starts out with a number greater 
than or equal to V2. Every other number in this sequence has the form 
2 
x42 


Qa 


48 2. Sequences 


for some positive «. We claim that every such number is greater than or equal to 


v2. In fact 
0< (a— V2)? =27-2V2242, andso 2V2r<2?42. 


a +2 


This implies V2 < . Thus every a,, is greater than or equal to V2. 


We now know that the sequence {an} is non-increasing and wounded below 
by V2. Thus, it is a bounded monotone sequence and has a limit by the previous 
theorem. Call the limit «. By (2.4.2), we have 


2 
2andnpi = a, + 2. 


If we take the limit of both sides of this equation and note that liman = limany1 = 
«, then the result is 


Qe =e? +2 or a =2. 


Thus, a = V2. 


Definition 2.4.4. If {an} is a sequence of real numbers, then we say lim an = 00 
if, for every real number M, there is a number N such that 


an > M whenever n> WN. 
Similarly, we say lime, = —0o if for every real number M there is an N such that 
@n <M whenever n> N. 


Example 2.4.5. If r > 0, prove that limn” = oo. 


Solution: To prove that limn” = oo, we must show that for every M/ there is 
an N such that 


n’>M_ whenever n> N. 


Clearly, we need only choose N to we M!/", 


With +00 and —oo as possible limits of a sequence, we can now assert: 
Theorem 2.4.6. Every monotone sequence has a limit. 


The proof of this is left to the exercises. 


Note that we must now make a distinction between a sequence converging and 
a sequence having « limit. A sequence may have a limit which is infinite, but a 
sequence which converges must have a finite limit. 


Theorem 2.4.7. Let {an} and {bn} be sequences of real numbers. Then 
(a) ifn >0 for all n, then liman = 00 if and only if lim 1/an = 0; 


(i) if {b,} is bounded below, then liman = 00 implies lim(an + bn) = 00; 


2.4. Monotone Sequences 49 


(c) liman = 00 if and only if lim(—an) = —00; 
(A) ifan < by for all n, then liman = co implies lim by, = 00; 


(e) if there is « positive constant k such that k < by for all n, then liman = 00 
implies lim anbn = 00. 


Proof. We will prove (a) and (b) and leave (c), (d), and (e) to the exercises. 
(a) If we are given an e, we will set M = 1/e. Conversely, if we are given an 
M, we will set ¢= 1/M. Then the statements 
[Han|<e and an >M 


mean the same thing (since an is positive) so that, if there is an N such that one 
of these statements is true for all n > N, then the other statement is also true for 
alln > N. Thus, lim 1/an = 0 if and only if limen = oo. 

(i) Let b,, be bounded Welow by, say, K. Assuming liman — 00, we wish to 
show that lim(an + bn) = 00. Given M € R, the number M — K is also in R and 
so, by our assumption that liman — 00, we know there is an N such that 


@n >M-—K_ whenever n> N. 


Then 
antb,>M—K+K=M~ whenever n>N. 
Thus, lim(an + by) = 00. o 
Example 2.4.8. Find the following limits: 
Qn? 43 
Cadet aa? 


(i) lime” for a > 1; 
(c) lim( ym + (-1)"). 


Solution: (a) We factor the largest power of n that occurs out of each of the 
denominator and the numerator. The result is 


W243 n2(243/n2) — 24+3/n? 


n+ n(i+i/n)  "1+1/n~ 
24 3/r2 
Nowiling a agtand ot Seat eit aa. ct: 
1+1/n 
2n? +3 
lim ey 
n+1 


by Theorem 2.4.7(e). 


(i) Since |1/e| < 1, it follows from Example 2.3.5 that lim1/a” = 0. Then 
lima” = +00 by Theorem 2.4.7(a). Another proof of this fact is suggested in 
Exercise 2.4.7. 

(c) Since Yn = n/?, Example 2.4.5 implies that lim Yn = 00. Then Theorem 
2.4.7(b) implies that lim(/n + (-1)") = 00. 


2. Sequences 


iF 


. Prove that lim 


eT 
Exercise Set 2.4 


Tell which of these sequences are non-increasing, non-decreasing, bounded? 
Justify your answers. 


. Prove that the sequence of Example 1.2.11 converges and decide what number 


it converges to. 


. Ife: = Land ani = (1—2°")an, prove that {an} converges. 


. Let {dn} be a sequence of @’s and I’s and define a sequence of numbers {en} 


by 
@n = dy2-! + dod? + +++ + dy2™™. 


Prove that this sequence converges to a number between @ and 1. 


. Let {sn} be the sequence of partial sums of a series with positive terms. That 


is, 


a 
Sn = Doan with all a, >@ 
k=1 
Prove that lims,, exists (though it may not be finite). 


. Give an alternate proof to the result of Example 2.3.5 that does not use the 


"} is a non-increasing sequence. 


Then show that @ is the only possible value for the limit. 


Binomial Theorem. Instead, first show that {le 


. Give an alternate proof of the result of Example 2.4.8(b) that does not use 


Example 2.3.5. Use the method of the previous exercise 
nd + 3n3 +2 _ 
ni-n+1 


mn 


. Prove that lim att = 00. 

. Prove Theorem 2.4.6. 

. Prove part (c) of Theorem 2.4.7. 
. Prove part (d) of Theorem 2.4.7. 
. Prove part (e) of Theorem 2.4.7. 


. Suppose {an} and {b,} are non-decreasing sequences that are interlaced in the 


sense that each term of the sequence {an} is less than er equal to some term 
of the sequence {bn} and vice versa. Prove that liman = lim bn. 


2.5. Cauchy Sequences 51 


2.5. Cauchy Sequences 


In this section we will prove two of the most important theorems about convergence 
of sequences. The proofs are based on the Nested Interval Property, which we 
describe below. 


Nested Intervals. A nested sequence of closed bounded intervals is a sequence 
TLDh2DkhD-:- 
in which each J, is a closed bounded interval and each interval in the sequence 
contains the next one. Thus, each of the intervals J, has the form [en,b,| for real 
numbers an < b,. The nested condition means that I, D I,41 for each n — that is, 
@n Santi < bn S bn 

for each n. 
Theorem 2.5.1 (Nested Interval Property). /f I; D [2 DI; D--- is a nested 


sequence of closed bounded intervals, then (\,, In / 0. That is, there is at least one 
point that is in all the intervals In. 


Proof. Let I, = [an, bn], as above. Then the sequence {en} of left endpoints is 
a non-decreasing sequence which is bounded alsove (by b:), and the sequence {b,} 
of right endpoints is a non-increasing sequence which is bounded below (by «)- 
The Monotone Convergence Theorem (Theorem 2.4.1) implies that both sequences 
converge. 
Ife =limen and b= limbn, then « < b by Theorem 2.3.8. In fact, 
an Sa csd<b, 

for each n. This means that [e,b] C In for every n and, hence, that [a,b] C Ay In. 


The set [a,b] is a closed interval if e < b and a single point if a = b. In either 
case, it is non-empty. a 


We leave to the exercises the problem of showing that this theorem is false if we 
don’t insist that the intervals are closed or if we don’t insist that they are bounded. 


The Bolzano-Weierstrass Theorem. A sequence {bx} is a subsequence of the 
sequence {an} if it is made up of some of the terms of {an}, taken in the order that 
they appear in {an}. Mere precisely: 


Definition 2.5.2. A sequence {b,} is a subsequence of the sequence {an} if there 
is a strictly increasing sequence of natural numbers {n,} such that by = @n,. 
Example 2.5.3. Give three examples of subsequences of the sequence 
0, 3/2, —2/3, 5/4, —4/5, 7/6, —6/7, 9/8,...,(—1)" + 1/n,.... 
Does the eriginal sequence converge? How about the three sulssequences? 
Solution: 

(a) 3/2, 5/4, 7/6,...,1+ 1/(2k),. 
() @, —2/3, —4/5,...,-14+1/(2 
(c) 3/2, 5/4, 9/8,...,1+1/2',.... 


52 2. Sequences 


The eriginal sequence clearly does not converge, but sequence (a) converges to 1, 
(b) converges to —1, and (c) converges to 1. 


Theorem 2.5.4. Jf {a,} has « limit (possibly infinite), then each of its subse- 
quences has the same limit. 


Proof. We will prove this in the case of a finite limit. The other cases are similar 
and are covered in the exercises. 

Suppose {an,} is a subsequence of {an}. Then {nx} is an increasing sequence 
of natural numbers, and this implies that ng > k for all k (Exercise 2.5.4). 


Now suppose lima, =. Given € > @, there is an N such that 
Jan —a|<e whenever n> N. 
Then k > N implies ng > N, since ng > k. Thus, 
len, —a|<e€ whenever k>N. 


By definition, this means that limen, = «. a 


Theorem 2.5.5 (Bolzano-Weierstrass Theorem). Every bounded sequence of 
real numbers has a convergent subsequence. 


Proof. If {en} is a bounded sequence, then it has an upper bound M and a lower 
bound m. This means that every a, is contained in the interval Jy = [m, AM]. We 
will construct a nested sequence of closed bounded intervals 


(2.5.1) Lh2IhDk>- 


such that J, contains infinitely many of the terms of {e,} for each k and the length 
of Ip is (M = m)/2k-1, 

The first term of our sequence is 4. Certainly /; contains infinitely many 
terms of {en} ~ in fact, it contains all of them — and its length is M —m. The 
recursion relation for our inductive definition is as follows: if  D Ig D+. D Ik 
have been chosen in such a way that J, contains infinitely many terms of {e,} and 
the length of Ik is (M — m)/2*~1, then we choose I, as follows. We cut I; into 
two closed intervals by dividing it at its midpoint. One of the two halves must 
contain infinitely many terms of {a} since I, does. Let I;,1 be the right half if 
it has this property; otherwise let it be the left half. Then Ix, is contained in [,, 
has length (AJ — m)/2*, and contains infinitely many terms of {e,}. By induction, 
there exists an infinite sequence (2.5.1) with the required properties. 


By the Nested Interval Theorem, there is a point « that is in every one of the 
intervals I;. Also, each interval J, contains infinitely many terms of the sequence 
{an}. We will inductively define a subsequence {en,} of {en} with the property 
that an, € J, for each k. We choose n, = 1 and define nx, 1 in terms of ng by the 
rule that ng; is the first integer greater than ng such that an,,, € Init. This is 
the basis for an inductive definition of the sequence we seek. Once this sequence of 
integers has been chosen, then {an,} is a subsequence of {a,}. We will show that 
this subsequence converges to a. 


2.5. Cauchy Sequences 53 


For each k, @ and a,, both belong to Ix. This means the distance between 
them can be no greater than the length of I, which is (MM — m)2!-*. That is, 


M—-—m 
Jan, — | < ea 
. M-m , r 
Since ST rans @, Theorem 2.3.1 implies that lima, = @. a 


Example 2.5.6. Construct a sequence {a,,} as follows: for each n let a, be the 
number obtained by replacing by @ all digits to the left of the decimal point in the 
decimal expansion of 10". Does this sequence have a convergent subsequence? 


Solution: This is a crazy sequence and it certainly does not appear to converge. 
However, each number in this sequence lies between @ and 1 and so it is a bounded 
sequence. By the Belzano-Weierstrass Theorem it has a convergent subsequence. 


Cauchy Sequences. 


Definition 2.5.7. A sequence {an} is said to be a Cauchy sequence if, for every 
€ > @, there is an N such that 


Jan —@m|<€ whenever n,m>N. 


Intuitively, this means we can make the terms of the sequence arbitrarily close 
to each other by going far enough out in the sequence. It is by no means obvious 
that this means that the sequence converges, but it does. 


Theorem 2.5.8. A sequence of real numbers {an} is @ Cauchy sequence if and 
only ifak converges: 


Proof. There are two things to prove here — the “if” and the “only if” parts. First 
we do the “if” part — that is, we will prove that a sequence is Cauchy if it converges. 


Assume {a,} converges to a number «. Then, given ¢ > @, there is an N such 
that 
Jan — a] <e/2 whenever n> N. 


Ifn,m > N, then 
Jan —@m| = lean —@ +4 @m| < lan — al + lam — | <€/2+e/2=c. 
Therefore, {a,,} is Cauchy. 


Now for the “only if” part. Suppose {a,} is Cauchy. We first prove that {a,} 
is bounded. In fact, there is an N such that 


Jan —@m| <1 whenever n,m>N. 
In particular, |an — eyv41| <1 for all n > N. This implies that 
enti —1<en<enyit1 whenever n>N. 


Then max{e1,...,@v,ev+1 + 1} is an upper bound for {e,}. Similarly, we have 
that min{e1,...,e,ev41—1} is a lower bound for {a}. Thus, {an} is a bounded 
sequence. 

We next use the Belzano-Weierstrass Theorem to conclude there is a subse- 
quence {e,,} of {a} which converges to a number a. Finally, we use the definition 


5d 2. Sequences 


of Cauchy sequence and what it means for a, to converge to @, Given € > 0, there 
are numbers Ny and Np such that 


lan —@m| <¢/2 whenever n> Ny 
and 
lan, —@| <¢/2 whenever k > No. 


If > N and if we choose a k > max{N), No}, then 


Jan —@| = lan —@n, tan, —al < lan — en, | + lan, — el] <€/2+€/2= 
This completes the proof that every Cauchy sequence is convergent. oO 
= k 
Example 2.5.9. Show that the sequence of partial sums of the series D1) ge 
converges. ag 
n , 
Solution: We have s, = DCD and so, for m > n, 
k=1 
m i oo 
1 1 1 4 
k = 
lom—el=| > CUES Oo San bei e 
Jk=n+1 k=nt+1 k=0 


Here we have used the fact that k < 2* for all k and the fact that the geometric 
cS 
1 


series §~2~* has sum ——— = 2. 

Since lim 1/2” = 0, by Example 2.3.5, given « > 0, there is an N such that 
n> N implies 1/2” < €. Then |m— 8n| < € for all n,m with m > n> N. This 
means that {8} is Cauchy and, hence, converges. 


ET 
Exercise Set 2.5 


1. Give an example of a nested sequence of bounded open intervals that does not 
have a point in its intersection. 


2. Give an example of a nested sequence of closed but unbounded intervals which 
does not have a point in its intersection. 


3. Prove that if J is a closed, bounded interval which is contained in the union 
of some collection of open intervals, then J is contained in the union of some 
finite subcollection of these open intervals. 


1. Prove by induction that if {ng} is an increasing sequence of natural numbers, 
then ny > k for all k. 


oa 


5. Which of the following sequences {en} have a convergent subsequence? Justify 
your answer. 
(a) an = (—2)". 

5 +(-1)"n 


on 
a 


2.6. lim inf and lim sup 


6. For each of the following sequences {a,}, find a subsequence which converges. 
Justify your answer. 
(a) an = (-1)". 
(b) ay =sinnn/4. 


n 
(c) an = al 1 with k, the largest integer k so that 2* < n. 


7. For each of the following sequences, determine how many different limits of 
subsequences there are. Justify your answer. 
(a) {1+ (-1)"}. 
(b) {cos nx /3}. 
(c) 1,1/2, 1, 1/2, 1/3, 1, 1/2, 1/3, 1/4, 1, 1/2, 1/3, 1/4, 1/5,.... 
8. Does the sequence sinn have a convergent subsequence? Why? 


9. Prove that a sequence which satisfies Jan; 1 — @n| < 2>” for all n is a Cauchy 
sequence. 


10. Suppose a sequence {a;} has the property that for every ¢ > 0, there is an 
such that 


lanti —@n| <e€ whenever n> N. 


Is {an} necessarily Cauchy? Prove it is or give an example where it is not. 


1 ae! 
11. Let sn =} gop be the sequence of partial sums of the series } psp. Prove 
k=1 


that {s,} converges. Hint: Show that it is a Cauchy sequence. 


12. Given a series }>ax, set n = Say and t, = ) axl. Prove that {sn} 


Kel 


k=1 
converges if {tn} is bounded. 


2.6. lim inf and lim sup 


According to the Belzano-Weierstrass Theorem, a bounded sequence has a conver- 
gent subsequence. In fact, a bounded sequence has many convergent subsequences 
and these may converge to many different limits, as is illustrated by some of the 
exercises in the previous section. Here we will show that there is a smallest closed 
interval that contains all of these limits. The endpoints of this interval are the 
lim inf and the limsup of the sequence. 


Given a sequence {a,}, we construct two monotone sequences {i,} and {s,} 
with {a,} trapped in between. They are defined as follows: 
i, = inflay sk > n}, 


2.6.1 
( ) 8, = sup{ax sk > nj. 


Note that the i, will all be —co if {an} is not bounded below and the sy 
will all be +00 if {an} is not bounded above. However, if {an} is bounded, say 
m < ay <M for all n, then m < in <8, <M for each n. Hence, in this case, the 
numbers i, and s, are all finite and {i,} and {s,} are bounded sequences. 


56 2. Sequences 


Theorem 2.6.1. Given a bounded sequence {a}, if {in} and {sn} are defined as 
above, then 

(a) {in} is a non-decreasing sequence; 

(b) {sn} is @ non-increasing sequence; 


(c) in S Qn < Sn for alln. 


Proof. If Ay = {ay : k > n}, then Anyi C Ap for each n. It follows from Theorem 
1.5.7(e) that, for all n, 
Sn41 = sup Anti S sup Ay = 8p and 


262) ing = inf Angi 2 inf An = in. 


Also, since a, € An, in = inf A, < an < sup Ay = Sp. im 


Since the sequences {i,,} and {s,,} are monotone, their limits exist. 
Definition 2.6.2. If {an} is a sequence and {i,} and {s,} are defined as above, 
then we set 

lim inf a, = limi, 


(2.6.3) 


limsup e, = lims,. 


Note that if {a} is not bounded below, then lim inf a, = —00, while if {an} is 
not bounded above, then lim sup @, = +00. 
Example 2.6.3. Find liminfa, and limsupe, if an = (-1)" + 1/n. 

Solution: As before, we let i, = inf{ax :k > n} and s, = sup{ax :k > n}. 

We claim in = —1 for all n. In fact, 

-1<(-1)*+1/k forall k 
implies 
ix = inf{(-1)* + 1/k:k > n} > 1 

Furthermore, (—1)* + 1/k approaches —1 for large odd k, so no number greater 


than —1 is a lower bound for {ay : k > n}. Thus, in = —1, as claimed. This implies 
that liminfe, = limi, = —1. 


We claim that 1 < 8, <141/n. In fact, the set {(-1)* +1/k: k > n} contains 


numbers greater than 1 no matter what n is, and so 


sn =sup{(-1)* + 1/k:k>n} > 1. 


Furthermore, (—1)* + 1/k < 14+ 1/nifk>n. Thus, 1 <8, <1+1/n. This implies 
that lim supe, =lims, = 1. 


Subsequential Limits. If {a,,} is a sequence, then by a subsequential limit of 
{a,,} we mean a number which is the limit of some subsequence of {a}. 


Theorem 2.6.4. Every subsequential limit of {an} lies between liminfa, and 
limsup en. 


2.6. lim inf and lim sup 57 


Proof. If {a,,} is a convergent subsequence of {ay}, Theorem 2.6.1(c) implies 
tng S Any S Sny> 


where in = inf{ax : k > n} and s, = sup{ax : k > n}. The sequences {in,} and 
{s5n,} are subsequences of {in} and {s,}, respectively, and, hence, have the same 
limits, namely lim inf a, and lim sup aq, by Theorem 2.5.4. It follows from Theorem 
2.3.8 and the above inequalities that 


liminf a, < lima, < limsupan. Oo 


Theorem 2.6.5. If {a,} is a sequence, then limsupa, and liminfa, are subse- 
quential limits of {an}. 


Proof. We will show that limsupa,, is a subsequential limit of {a,}. The same 
statement for lim inf has a similar proof. We will assume that lim sup @y is a finite 
number s. The case where limsupa@, = 00 is left as an exercise. 


We must show that there is some subsequence of {an} which converges to 
s = limsupay. We will construct such a sequence inductively. As before, we let 
sn = sup{a, : k > n}. For each ¢ > 0, the number s, — € is less than s, and 
so it is not an upper bound for {ay : k > n}. This means there is an element of 
{ay : k > n} which is greater than s, — € but less than or equal to sy. We will 
choose a sequence of such elements by induction. 


We choose n, such that s;— 1 < an, < 81. Suppose n) <n2 < +++ < nm have 


been chosen so that 


8)-VWi<an, Ss; for j=1,...,m. 


ay then choose nm+41 > mm such that sy... —1/(m+1) < OS Si eis 
However, nmi > m+ 1 and s0 Sn,,,, S 8m1. In other words (2.6.4) holds with 
m replaced by m+ 1. This completes the induction step and proves that there is 
an increasing sequence of natural numbers {nj} such that (2.61) holds for all j. 


Since both s; — 1/j + s and s; — s, the subsequence {a,,} also converges to 
s by the squeeze principle. o 

A Criterion for Convergence. 
Theorem 2.6.6. A sequence {an} has limit a if and only if 

lim sup a, = lim inf a, = a. 
Proof. We first prove that if limsupa, = liminfa, = a, then lima, exists and 
equals a. By Theorem 2.6.1(c), 
tn Sn S Sn, 


where in and sy are as before. Since limin = lim s,, = a, it follows from the squeeze 
principle that liman =a. 

Next we assume lima, = a. By Theorem 2.5.4 each subsequence of {an} also 
has limit a. Since limsup a, and liminf a, are subsequential limits of {ay}, they 
must both be equal to a. This completes the proof. Oo 


58 2. Sequences 
eT 
Exercise Set 2.6 
1. Find limsupe, and liminfe,, for the following sequences: 


2. Find lim inf and lim sup for the sequence of Exercise 2.5.6(c 


3. Find lim inf and limsup for the sequence of Exercise 2.5.7(c 


10. 
11. 


12. 


(a) an = (-1)"5 
(b) an = (-1/n)" 


(c) an =sinnn/3. 


)s 
ie 


. If limsupe, and limsup by are finite, prove that 


lim sup (an + bn) < limsup a, + lim sup bn. 


. If lim sup @, is finite, prove that lim inf(—a,) = —limsupan. 


If k > 0 and limsup ay is finite, prove that limsup ken = klimsup en. 


. If an > 0 and by > 0, prove that limsup anby < (lim sup an) (lim sup bn). 


If fan} and{b,} are non-negative sequences and {by} converges, prove that 


lim sup @nbn = (lim sup e,)(lim bn). 


. Let {r,}%1 be an enumeration of the rational numbers ~ that is, a sequence 


of rational numbers in which each rational number appears exactly once. That 
such a thing exists is proved in the appendix. Show that, for each x € R, there is 
a subsequence of this sequence which converges to «. Hint: Use Exercise 1.4.7, 


Prove Theorem 2.6.5 for limsup in the case where lim sup ap = +00. 


Prove that c is limsupa, if and only if there is a subsequence of {an} which 
converges to c but there is no subsequence of {a} which converges to a number 
greater than c. 

Which numbers do you think are subsequential limits of {sinn}°°,? Can you 
prove that your guess is correct? 


Se 
Chapter 3 


Continuous Functions 


In this chapter we begin our study of functions of a real variable. The concepts of 
limit and continuity for such functions are of critical importance. 


3.1. Continuity 


We will be dealing with functions from a subset of R to R. Usually in this chapter, 
the domain of a function will be an interval — closed, open, or halfopen, bounded 
or unbounded ~ or a finite union of intervals. However, it is certainly possible to 
consider functions which have much more complicated subsets of R as domain. 


To define a function from a subset of R to R, we must specify a domain for 
the function and the rule or formula that specifies the value of the function at each 
point of that domain. For example, the following are descriptions of functions: 


(1) f(e) = 1/z on (0,00); 
(2) gle) = 1/2 on B\ {0}; 
(3) h(x) = sine on (0, 

(A) ke) = sine on R; 
(5) e( 


kn 


=e® on (0,1). 


Although a function may have a natural domain ~ that is, a largest subset of 
R on which the formula describing it makes sense — we are at liberty to choose a 


smaller domain for the function if we wish. 


There are a number of special types of functions that we will deal with on a 
regular basis: 


(1) Polynomials: functions of the form a,x" + a,—12” 14... 4 a9, where the 
ax are constants for k =0,...,n. Ifan 7 0, then the degree of the polynomial 
is n. The natural domain of a polynomial is R. 


60 3. Continuous Functions 


(2) Rational functions: functions of the form p/q with p and ¢ polynomials. 
The natural domain of a function of this form is the set of all real numbers 
where the denominator is non-zero. 


3) Trigonometric functions: sin, cos, tan, cot, sec, csc. 


4) Inverse trigonometric functions: sin“, tan™!, etc. 


Exponential and log functions: ¢® and Inz. 


Power functions: x* for a € R. The natural domain is {« € R;x > 0} 
unless a is a rational number with an odd denominator — in this case 2” is 
defined for all real numbers «x. 


Elementary functions are functions that can be constructed from functions of 
the above types using addition, multiplication, quotients, and composition. It is 
not the case that all the functions we wish to consider are elementary functions. 


Continuity. 


Definition 3.1.1. Let f be a function with domain D C R and let a be an element 
of D. We will say that f is continuous at a if, for each € > @, there is a 6 > 0, such 
that 


(3.1.1) If() = f(a)|<€ whenever «€D and |r—al <6. 


There is a subtle difference between the definition of contimuity given above 
and the one that is usually given in calculus courses. The difference is that our 
definition depends on the domain of the function. A given expression may not be 
continuous at a point a if given a certain domain containing a, and yet it may be 
continuous at a if it is given a smaller domain. 


Example 3.1.2. Give an example of a function which is not continuous at a certain 
point of its domain but which is continuous at this point if a smaller domain is 
chosen for the function. 

Solution: Each «x € R is in exactly one of the intervals [n,n + 1) for n € Z. 
Consider the function defined on R by 


f(a) = 
The graph of this function is shown in Figure 3.1.1, which shows why this function 
is called the sawtooth function. We will show that this function is not continuous 
at @ (er at any other integer for that matter). However, if its domain is restricted 
to be the interval [@, 1), then it is continuous at @. 
Now f(x) =a on [@,1) and f() = x +1 on [+1,0). Suppose ¢ is greater than 
@ but less than 1/2. Then, for any 5 > @, the interval (—6, 5) will contain points of 
(=1/2,0) and for any such point 2, 


\f(e) — f@| = la +1-O > 1/2>€ 


Thus, there is no way to choose 6 such that |f(«) — f(@)| < € whenever |x —@| < 6. 
This means that f is not continuous at 0. The same argument works at any other 
integer n. 


—n if xeé[nn+1),neZ. 


3.1. Continuity 61 


Figure 3.1.1. The Sawtooth Function. 


On the other hand, suppose we define a new function g which is the same as 
f, but with domain cut down to be just D = (0,1). Then g(x) = x on D. If, for a 
given € > 0, we choose 6 = ¢, then 


la() — 9(0)| = |r| <e whenever « € D and |x —0| = |x| <6. 
Thus, g is continuous at 0. 
Definition 3.1.3. We will simply say that a function with domain D is continuous 


if it is continuous at every point of D. 


The technique we use to show that f is continuous at a, using just the definition 
oP Goulinuitys biaimilatto: the dechniqie we used ‘in the previnusichapientovarove 
ihdiin eaquente converges to a given limit? Thats given an eS O,/we use a eoties 
of equalities and inequalities to get [(«) — f(a)| dominated by-a simple expression 
which we can easily see is less than € when |x — a| is less than a certain number 
(depending on € but not on x), We choose this number as our 6. 

Example 3.1.4. Prove that f(x) = x? is continuous at x = 2. 


Solution: We have 
[f(w) ~ f(2)| = |x? = 4] = |e + 2lfe — 21. 


If we insist that | — 2| < 1, then 1 < x < 3 and so |x + 2| <5. Thus, given € > 0, 
if we choose 6 = min{1,¢/5}, then 


IF(w) = f(2)| = |x + 2l|x — 2| < Sle -2|<e whenever |x —2| <6. 
This proves that f is continuous at 2. 


Example 3.1.5. Prove that 1/x is continuous at a if a > 0. 


Solution: We work on the expression |1/« — 1/a|: 


=a 


(3.1.2) ae 


62 3. Continuous Functions 


if « > a/2. Thus, if we choose 6 = min(a/2,a7e), then |« — al < 5 implies that 
|e —a| < a/2 and |x —a| < ae. The first of these implies that a/2 < x and this 
along with |a —a| < ae implies that |f(«) — f(a)| < © because of (3.1.2). 


An Alternate Characterization of Continuity. There is an alternate character- 
ization of continuity that will allow us to use the theorems of the previous chapter 
to easily prove the standard theorems concerning continuous functions: 


Theorem 3.1.6. Let f be a function with domain D and suppose a € D. Then f 
is continuous ata if and only if, whenever {an} is a sequence in D which converges 
toa, then the sequence {f(an)} converges to f(a). 


Proof. We first prove the “only if” part — that is, we assume f is continuous and 
proceed to prove the statement about sequences. Let {,} be a sequence in D with 
@p — a. Given € > 0, there is a 6 > 0 such that 


|f(x) — f(a)|<e whenever «€D and 


al <6. 
For this 6, there is an N such that 
|zn —a| <6 whenever n> N. 


On combining these statements, we conclude 


\f(en) — F(a)| <€ whenever n> 
Thus, f(rn) + f(a). This completes the proof of the “only if” half of the theorem. 


We will prove the “if” part by proving the contrapositive — that is, we will 
prove that if f is not continuous at a, then there is a sequence {zn} in D such that 


x, > a but {f(en)} does not converge to f(a). 


The assumption that f is not continuous at @ means that there is an € > 0 for 
which no 6 can be found for which (3.1.1) is true. This means that, no matter what 
6 we choose, there is always an x € D such that 


lje-al<6 but |f(x)-f(a)|>e. 


In particular, for each of the numbers 1/n for n € N we may choose an x, € D 
such that 

lan —al<1/n but |f(an)— f(a)| D6 
These numbers form a sequence {xn} which converges to a (since 1/n + 0) but 
whose image sequence { f(«n)} does not converge to f(a). This completes the proof 
of the “if” part of the theorem. o 


Combining this with the Main Limit Theorem yields the following: 


Theorem 3.1.7. [fr is a positive rational number, then the function f(x) = x" is 
continuous on its natural domain. 


Proof. The natural domain D of f(x) = 2" is R ifr has an odd denominator and is 
the set of non-negative real numbers ifr has an even denominator when written in 
lowest terms. In either case, if a € D and {an} is a sequence in D which converges 
to a, then {27} converges to a” by parts (e) and (f) of the Main Limit Theorem 
(Theorem 2.3.6). This implies that x” is continuous by the previous theorem. 


3.1. Continuity 63 


Remark 3.1.8. We will eventually prove that the functions «* for a € R, e”, nz, 
and the inverse trigonometric functions are all continuous. In the meantime, we will 
assume this is true whenever it is convenient to do so in an exercise or example. The 
continuity of the trigonometric functions is usually proved adequately in elementary 
calculus and so we will use the continuity of the trigonometric functions whenever 
it is needed. 


Combinations of Continuous Functions. If f and g are functions with domains 
Dy and Dg, then f + g and fg have domain D = Ds Dg, and f/g has domain 
{x €D: g(x) / 0}. 


Theorem 3.1.9. Let f and g be functions with domains Dy and Dy. Assume f 


and g are both continuous at a point a € D = Ds NM Dg, and let ¢ be a constant. 
Then 


(a) cf is continous at a; 
(b) f +4 is continous at a; 
(c) fg is continous at a; 


(a) f/g is continous at a, provided g(a) / 0. 


Proof. These are all proved using the same technique used to prove the previous 
theorem — combine Theorem 3.1.6 with the corresponding part of the Main Limit 
Theorem. We will give proofs of parts (a), (b), and (c) and pose part (d) as an 
exercise. 

If f and g are continuous at a and {2p} is any sequence in D which converges 
to a, then Theorem 3.1.6 tells us that {f(«n)} converges to f(a) and {9(«n)} 
converges to g(a). By parts (a), (b), and (c) of the Main Limit Theorem (Theorem 
2.3.6), {cf(an)} converges to cf(a), {f(an) +9(a«n)} converges to f(a) + g(a), and 
{f(an)g(an)} converges to f(a)g(a). Therefore, by Theorem 3.1.6 again, cf, f +9, 
and fg are continuous at a. a 


Example 3.1.10. Prove that each polynomial is continuous on all of R and each 
rational function is continuous at all points where its denominator is not zero. 


Solution: Every positive integral power of « is continuous on R by Theorem 
3.1.7. By (a) of the above theorem, each constant times a power of 2 is also 
continuous. Then (b) of the theorem implies that every polynomial is continuous 
on R and (d) implies that every rational function is continuous at points where its 
denominator is not zero. 


Composition of Continuous Functions. If f is a function with domain Dy and 
g is a function with domain D,, then the composite function f og has domain 
Dyog = {x € Dy : g(x) € D;}. Suppose a is in this set, so that a € D, and 
g(a) € Ds. Then we can ask if f og is continuous at a. The following theorem 
answers this question. Its proof is left to the exercises. 


Theorem 3.1.11. With f and g as above, let a be in the domain of fog. Then 
f 0g is continuous at a if g is continuous ata and f is continuous at g(a). 


64 3. Continuous Functions 


Example 3.1.12. Prove that f(x) = is continuous as a function on its 


natural domain. 

Solution: The function f has as natural domain the interval (—1, 1), since it 
is for points in this interval and those points alone that /T—2? is defined and 
non-zero. The function 1 — «? is continuous on (—1, 1) because it is a polynomial. 
The square root function is continuous on (0,00) by Theorem 3.1.7. Thus, the 
composition YT— 2? is continuous by Theorem 3.1.11. Finally, f is continuous by 
part (a) of Theorem 3.1.9. 


SEES 
Exercise Set 3.1 


1. If f is a function with domain (0, 1], what is the domain of f(x? — 1)? 


2 
abetee : Nan es Ly ees ere A 
2. What is the natural domain of the function z 7 With this as its domain, 
Gre 


is this function continuous? Why? 


3. Prove that I has natural domain R and is continuous. 


1. Show that the function f(x) = |x| is continuous on all of R. 


) 


. Assuming sin is continuous, prove that sin(x? — 4) is continuous. 
. Prove (d) of Theorem 3.1.9. 


. Prove Theorem 3.1.11. 


SNOn 


. We know /Z is continuous at all a > 0, by Theorem 3.1.7. Give another proof 
of this fact using only the definition of continuity (Definition 3.1.1). 


9. Consider the function 


-1 if «<0. 
Is this function continuous if its domain is R? Is it continuous if its domain is 
cut down to {x € R: x > 0}? How about if its domain is {4 € R: x < 0}? 


10. Let f be a function with domain D and suppose f is continuous at some point 
a€ D. Prove that, for each € > 0, there is a 6 > 0 such that 


If(a) — fly) <€ whenever «,y€ DA (a—6,a + 6). 


sinl/xe if «/0 
0 if «=0 


11. Prove that the function f(«) = { is not continuous at 0. 


asinI/e if x /0 
0 if «=0 


12. Prove that the function f(«) = { is continuous at 0. 


3.2. Properties of Continuous Functions 65 


3.2. Properties of Continuous Functions 


Continuous functions on closed bounded intervals have a number of highly useful 
properties. We explore some of these in this section. 


Maximum and Minimum Values. A function f with domain D is said to be 
bounded above on S C D if and only if the set f(S) = { f(x) : « € S} is bounded 
above. This is true if and only if 


sup f = sup{ f(x): € S} 
s 


is finite. Similarly, f is bounded below on S if f(S) is bounded below and this is 
true if and only if 


inf f = inf{ f(e) + € S} 
is finite. If f is bounded above and below on S, then we say f is bounded on S. If 
f is bounded on its domain D, then it is said to be a bounded function. 


Just as a bounded set may have a finite sup but may not have a maximum 
element (the sup may not belong to the set), a function f may be bounded above 
on S without having a maximum value (this happens if sup f is not a value that f 
assumes on 8). However, if f is a continuous function on a closed bounded interval, 
then the situation is particularly nice. 


Theorem 3.2.1. If f is a continuous function on a closed bounded interval I, then 
f is bounded on I and, in fact, it assumes both a minimum and a maximum value 
on I 


Proof. We will prove that M = sup,¢, f() is finite and, in fact, is a value that f 
takes on somewhere on J. The analogous fact for infrer f(a) has the same proof. 

We will construct a sequence {«,} in J with the property that lim f(«,) = M. 
If AM is finite, we choose an xy € J such that M—1/n < ier) <M. This is 
possible because Af is the least upper bound of the set {f(x) : « € I}. On the 
other hand, if Mf = +00, we choose x, € J such that n < an. In either case, we 
have lim f(an) = M 


Since J is a bounded interval, the sequence {an} is bounded. By the Bolzano- 
Weierstass Theorem, this sequence has a convergent subsequence {a,, }. Since I is 
closed, this subsequence converges to a point ¢ of I (Exercise 2.2.13). Since f is 
continuous at 2 and limar,, = ¢, we have lim f(«n,) = f(c). Since we also have 
lim f(@n,) = M, we conclude that AM is finite and equal to f(c). a 


Each of the hypotheses of the above theorem is necessary in erder for the 
conclusion to hold. This is illustrated by the following example and some of the 
exercises. 


Example 3.2.2. Give examples of functions on [0,1] which are 


(1) unbounded; 


(2) bounded but with no maximum value. 


66 3. Continuous Functions 


Solution: (1) Let 
1 if «< 1/2, 


f(a) = if «> 1/2. 


1 
Ie —1 
This function is clearly unbounded on [@, 1] since it blows up as x approaches 1/2 
from the right. Note that f is not continuous at 1/2. 


(2) Let 
| fae if 2 <1/2, 
F(a) {i if x>1/2. 


This function is bounded on [@, 1] and its sup on this interval is 1, but it never takes 
on the value 1 on the interval. Again, this function is not continuous at 1/2. 


Exercises 3.2.4 and 3.2.5 ask the student to come up with examples where the 
conclusion of the theorem fails if the hypotheses on I are not satisfied even though 
the function f is continuous on J. 


Intermediate Value Theorem. The next theorem says that if a continuous 
function on an interval takes on two values, then it takes on every value in between. 
Its proof uses the Nested Interval Theorem. 


Theorem 3.2.3 (Intermediate Value Theorem). Let f be defined and contin- 
uous on an interval containing the points a and b and assume that a <b. If y is 
any number between f(a) and f(b), then there is a number c with a <c <b such 
that f(c) = y. 


Proof. Let a; = a and b; = 6 and consider the closed interval I, = [a1,bi].. We 
are given that y lies between (a1) and f(b1). We will construct a nested sequence 
of closed intervals with the same property. That is, we will prove by induction that 
there is a sequence of closed intervals {Jz = [ax, bk]} such that, for all k > 1, 


(1) the length of Ix is (b — a)/2*-}, 
(2) y lies between f(ax) and f(bx)- 


Suppose it is possible to choose {I, D Iz D +++ D In} so that (1) and (2) hold for 
k <n. Then we cut J, into two halves that have only the midpoint c, of Jn in 
common. If y lies between f(an) and f(bn), then it either lies between f(an) and 
f(cn) or it lies between f(en) and f(bn) (Exercise 3.2.6). If only one of these is 
true, then choose J,,; to be the corresponding half of J,. If both are true, then 
choose J;, 1 to be the right half of J, This results in a choice for J,,41 that satisfies 
(1) and (2) for k = n+ 1. This completes the induction step of the construction 
and, hence, the proof that a nested sequence of intervals satisfying (1) and (2) can 
be constructed. 


By the Nested Interval Property, there is a point c in the intersection of all the 
intervals In. By hypothesis f is continuous at ¢ and so, given € > 0, there is a6 > 0 
such that 


(3.2.1) fle) —€< f(x) < fle) +e whenever re TI, |x-el <6. 


3.2. Properties of Continuous Functions 67 


Now the length of J,, is L/2"-1, where L is the length of I. Since lim L/2"-! = 
2L lim(1/2)” = 0, the length of J,, will be less than é for n sufficiently large. Suppose 
nis this large. Then |x — ¢| <6 for all « € In, since ¢ € In. By (3.2.1) 


fle) —€ < flan) < fl) te and f(c)—e< f(bn) < fle) Fe 


Taken together with the fact that y lies between f(an) and f(bn), these inequalities 
imply that 


fo) -e<y<f)te or |f(c)-yl<e 


This is only possible for all positive ¢ if f(c) = y. This completes the proof. 


This is another example of a theorem which is not true if the function is not 
required to be continuous (see Exercise 3.2.7) 


Image of an Interval. 


Theorem 3.2.4. If f is a continuous function defined on a closed bounded interval 
I= [a,b], then (1) is also a closed, bounded interval or it is a single point. 


Proof. By Theorem 3.2.1, f has a maximum value M and a minimum value m on 
I. By Theorem 3.2.3, f takes on every value between m and Af on I. Therefore 
the image of I is exactly [m, M]. This is a closed interval if m / M, and it is a 
point otherwise. ia 


Inverse Functions. We learn in calculus that a function which is monotone 
increasing or monotone decreasing on an interval has an inverse function. Here a 
function f is monotone increasing on I if f(x) < f(y) whenever x,y € Iand x < y. 
A function f is monotone decreasing on I if f(a) > f(y) whenever x,y € I and 
a <y. A function which is monotone increasing or monotone decreasing on I is 
said to be strictly monotone on I. For monotone functions, there is a converse to 
the previous theorem. 


Theorem 3.2.5. If f is strictly monotone on I and its range f(I) is an interval, 
then f is continuous on I. 


Proof. Suppose f is monotone increasing, Let f(J) = [s,¢]. Given ¢ € I, we will 
prove that f is continuous at c. We do this first in the case where c is not an 
endpoint of I = [a,b]. 

Given ¢ > 0, let wu = max{s, f(c) — ¢} and v = min{t, f(c) +e}. Then w and v 
are points of [s,¢] and 


fle) —e Sus fl) <v< flc)te 


Note that the only way one of the inequalities u < f(c) < v can be an equality is if 
f(c) is one of the endpoints s or ¢. However, this cannot happen, since c is not an 
endpoint of I. Thus, u < f(c) < v. 
Since f(I) = [s,t], there are points p,q € I such that f(p) =u and f(q) = v. 
Since f is monotone increasing, 
p<c<@. 


68 3. Continuous Functions 


We choose 6 = min{¢ — c,c— p}. Then |e —c| < 6 implies p < x < @ and this 
implies 


fl) -e<uK< f(x) <v<fle)+e, thatis, |f(x)- f(ol<e. 
This proves that f is continuous at c in the case where c is not an endpoint of I. 


If c is an endpoint of J, then the argument is the same except that we only have 
to concern ourselves with points that lie to one side of ¢ and of f(c). The details 
are left to the exercises. 


It remains to prove that a monotone decreasing function on I with a closed 
interval for its range is continuous. However, if g is monotone decreasing, then 
f = —@ is monotone increasing, also has a closed interval as image, and, hence, is 
continuous by the above. But if —g is continuous, then so is ¢=(-1)(-9). 0 


Theorem 3.2.6. A continuous, strictly monotone function f on a closed interval 
I has a continuous inverse function defined on J = f(I). That is, there is a 
continuous function g, with domain J, such that @(f(x)) = « for all « € I and 
F(g(y)) = y for ally e J. 


Proof. Since f is strictly monotone, for each y € J there is exactly one x € I such 
that f(x) = y. We set g(y) = x. Then, by the choice of x, we have f(q(y)) = 
f(x) = y and (f(x) = g(y) = x. 

The function g is strictly monotone because f is strictly monotone. Further- 
more, the range of g is I. By the previous theorem, this implies that g is continu 
ous. Oo 


————SSS a 
Exercise Set 3.2 


1. Find the maximum and minumum values of the function f(a) = «* — 2 on the 


interval [@, 3). 

2. Prove that if f is a continuous function on a closed bounded interval I and if 
f(x) is never 0 for x € I, then there is a number m > 0 such that f(«) > m 
for all « € I er f(x) < —m for all « € 1. 


3. Prove that if f is a continuous function on a closed bounded interval [a,b] and 
if (zo, yo) is any point in the plane, then there is a closest point to (xo, Yo) on 
the graph of f. 

4. Find an example of a function which is continuous on a bounded (but not 
closed) interval J but is not bounded. Then find an example of a function 
which is continuous and bounded on a bounded interval J but does not have a 
maximum value. 


5, Find an example of a function which is continuous on a closed (but not bounded) 
interval J but is not bounded. Then find an example of a function which is 
continuous and bounded on a closed interval J but does not have a maximum 
value. 


6. Show that if a,b,c, and « are numbers and a is between @ and b, then it is also 
between either @ and ¢ or b and c. 


3.3. Uniform Continuity 69 


7. Give an example of a function defined on the interval [@, 1] which does not take 
on every value between (0) and f(1). 

8. Show that if f and g are continuous functions on the interval [a,b] such that 
fla) < g(a) and 4(b) < f(b), then there is a number c € (a,b) such that 
f(e) = a(c)- 

9. Let f be a continuous function from [@,1] to [@, 1]. Prove there is a point 
cc € [@,1] such that f(c) = c — that is, show that f has a fixed point. Hint: 
Apply the Intermediate Value Theorem to the function g(x) = f(a) — «. 


10. Use the Intermediate Value Theorem to prove that, if n is a natural number, 
then every positive number a has a positive nth root. 


11. Prove that a polynomial of odd degree has at least one real root. 


12. Use the Intermediate Value Theorem to prove that if f is a continuous function 
on an interval a, b] and if f(a) < m for every x € [a,b), then f(b) < m. 


13. Prove that if f is strictly increasing on [a,b], then its inverse function is strictly 
increasing on [f(a), f(b)]- 


3.3. Uniform Continuity 


Compare the definition of continuity given in Definition 3.1.1 with the following 
definition. 


Definition 3.3.1. If f is a function with domain D, then f is said to be uniformly 
continuous on D if for each ¢ > 0 there is a 6 > 0 such that 


(3.3.1) \f(a) — f(a)| <e€ whenever x,a € Dand |x —a| < 6. 


By contrast, Definition 3.1.1 tells us that f is continuous on D if for each a € D 
and each € > 0 there is a 6 > 0 such that 


|f(x) — f(a)|<e€ whenever x € D and |x —al < 6. 


These two definitions appear to be identical until one examines them closely. 
The difference is subtle but extremely important. In the definition of uniform 
continuity, given ¢, a single 6 must be chosen that works for all points a € D, while 
in the definition of continuity, 6 is allowed to depend on a. 


Example 3.3.2. Find a function which is continuous on its domain but not uni- 
formly continuous. 

Solution: We claim that the function f(a) = 1/x with domain (0, 1] is con- 
tinuous but not uniformly continuous on (0, 1]. 


It is continuous because « is continuous on (0, 1] and is never 0 on this set. 
Thus, Theorem 3.1.9(d) implies that 1/z is continuous at each point of (0, 1]. 


On the other hand, if we attempt to verify that f is uniformly continuous, we 
run into trouble. Given ¢ > 0, we try to find a 6 > 0 such that 


|I/x—1/a| <e whenever a,x €(0,1] and 


-al <6. 


70 3. Continuous Functions 


20 


15 


10 


Figure 3.3.1. The Functien 1/x en (0, 1] 


However, for any 6 > 0, if x and a are chosen so that 0 < @ <a < 6, then it will 
be true that 

|jx-—al <6. 
However, we can make 1/x and, hence, |I/a — 1/a| as large as we want by simply 
keeping a < 6 fixed and choosing x < a small enough. In particular, [1/2 — 1/a| 
can be made larger than ¢ regardless of what ¢ we start with. Thus, f(«) = 1/2 is 
not uniformly continuous on (0, 1]. 


Example 3.3.3. Prove that f(«) = 1/2 is uniformly continuous on any interval of 
the form {r, 1], where r > 0. 


Solution: If « and a are in the interval [r, 1], then 


This implies that f(a 


= 1/x is uniformly continuous on [r, 1]. 


Conditions Ensuring Uniform Continuity. In the last example, the domains of 
the functions were all closed, bounded intervals. It turns out that, in this situation, 
continuity implies uniform continuity. This is the main theorem of the section. 


Theorem 3.3.4. If f is a continuous function on a closed, bounded interval I, 
then f is uniformly continuous on I. 


Proof. We will prove the contrapositive. Suppose f is not uniformly continuous 
on [a, 6]. Then there is an € > 0 for which no 6 can be found which satisfies (3.3.1). 


3.3. Uniform Continuity 71 


In particular, none of the numbers 1/n for n € N will suffice for 6. This means 
that, for each n, there are numbers a, a, € J such that 


len — an] <1/n but [f(en) — Flan) > & 


By the Bolzano-Weierstrass Theorem, some subsequence {an,} of the sequence 
{xn} converges to a point « of I. The inequality rn, — an] < 1/ne < 1/k 
implies that {an,} converges to the same number. Since |f(an,) — f(an,)| > 6 the 
sequences {f(an,)} and {f(an,)} cannot converge to the same number. However, 
they would both have to converge to f(x) if f were continuous at x, by Theorem 
3.1.6. Thus, we conclude that f is not continuous at every point of J. oO 


Consequences of Uniform Continuity. 


Theorem 3.3.5. If f is uniformly continuous on its domain D and if {xn} is any 
Cauchy sequence in D, then {f(an)} is also a Cauchy sequence. 


Proof. Given ¢ > 0, by uniform continuity there is a 6 > 0 such that 
\f(z) — f(y)| < © whenever 2,y € D and |x — y| < 6. 
Since {ap} is Cauchy, there is an N such that 
|tn —am| <6 whenever n,m>N. 
Combining these two statements tells us that 
If(an) = f(am)|<e€ whenever n,m>N. 
Thus, {f(an)} is a Cauchy sequence. oO 
An interval may be closed, open, or half-open. If J is an interval, we denote 


by T the closed interval consisting of I along with any endpoints of J that may be 
missing from I. If J is a bounded interval, then T is a closed, bounded interval. 


Given a continuous function f on a bounded interval J that is not closed, it 
may or may not be possible to extend f to a continuous function on 7. That is, it 
may or may not be possible to give f values at the missing endpoint(s) that make 
the new function continuous. The next theorem tells when this can be done. 


Theorem 3.3.6. If f is a continuous function on a bounded interval I, which may 
not be closed, then f has a continuous extension to T if and only if f is uniformly 
continuous on I. 


Proof. If f has a continuous extension f to T, then f is uniformly continuous on 
T by Theorem 3.3.4. But if a function is uniformly continuous on a set, then it 
is also uniformly continuous when restricted to any smaller set. Since f is just f 
restricted to the smaller domain I, f is uniformly continuous on J. 


Conversely, suppose f is uniformly continuous on J. Let a be a missing endpoint 
of J (left or right). There are lots of sequences in J which converge to a. Let {an} 
be one of these. Then {an} is a Cauchy sequence in I and so the previous theorem 
implies that {f(an)} is also a Cauchy sequence. Since Cauchy sequences converge, 
we know that there is a y such that f(an) 3 y. 

We claim that if {bn} is any other sequence in I converging to a, then {f(bn)} 
converges to the same number y. We prove this by constructing a new sequence 


72 3. Continuous Functions 


{en} in I, which also converges to a, by interlacing the terms of {an} and {bn}. 
That is, we set 


Cok-1 = Gk, 


Con = be. 


Since cn + a, we may argue as before that {f(cn)} converges to some number. 
But one of its subsequences, { f(co,-1)}, converges to y. This implies that { f(en)} 
must converge to y as must any of its subsequences. In particular {co} — {0x} 
converges to y. This proves our claim. That is, the number y = lim f(an) is the 
same no matter what sequence {an} in I converging to a is chosen. 


We now define a new function f on [U{a}, by setting f(a) = y and f(x) = f(x) 
for each x € I. It is clear from the construction that f will be continuous at a, 
since f(«n) + y = f(a) for every sequence {xn} in IU {a} that converges to a. 

This proves that a uniformly continuous function on a bounded interval J can 
be extended to be continuous on the interval obtained by adjoining one missing 
endpoint to J. If the other endpoint is also missing, we simply repeat the process 
to get an extension to all of 7. o 


This theorem often provides a quick way to see that a function on a bounded 
interval is not uniformly continuous. 


Example 3.3.7. Show that the function f(«) = is not uniformly continuous 


tS 
on the interval (—1, 1). 

Solution: If f is uniformly continuous on this interval, then the previous 
theorem implies that f has a continuous extension to [1,1]. However, a continuous 
function on a closed bounded interval is bounded. The function f is not bounded 
on (—1,1), and so no extension of it to [1,1] can be bounded. Thus, f is not 
uniformly continuous. 


If the interval J is unbounded, then it is possible for a function on I to be 
uniformly continuous and yet unbounded. 


Example 3.3.8. Show that the function f(x) = V7 is uniformly continuous on 
[1, +00). 
Solution: If x,y € [1, +00), then 


Iva — Val 


le -y| 
Vi+ Jy 
since VF > 1> 1/2 and Vy > 1> 1/2 if e,y € [1, +00). This clearly implies that 
f is uniformly continuous on [1,+00). In fact, given € > @, it suffices to choose 
6 = to obtain 


<|e-yl, 


[f(e) — fl <e whenever x,y € [1, +00) and |e —y| <6. 


3.4, Uniform Convergence 73 


o 


10. 


. Is the function f( 


rE, 
Exercise Set 3.3 


. Is the function f(c) = 2? uniformly continuous on (0,1)? Justify your answer. 


= 1/x? uniformly continuous on (0,1)? Justify your an- 


swer. 


. Is the function f(x) = 2? uniformly continuous on (0,+00)? Justify your 


answer. 


. Using only the ¢ — 6 definition of uniform continuity, prove that the function 


f(x) = i is uniformly continuous on (0,00). 


. In Example 3.3.8 we showed that V7 is uniformly continuous on [1, +00). Show 


that it is also uniformly continuous on [0, 1]. 


. Prove that if I and J are overlapping intervals in R (IA J / 0) and f isa 


function, defined on [.U J, which is uniformly continuous on I and uniformly 
continuous on J, then it is also uniformly continuous on J UJ. Use this and 
the previous exercise to prove that /7 is uniformly continuous on (0, +00). 


. Prove that if J is a bounded interval and f is an unbounded function defined 


on I, then f camnot be uniformly continuous. 


. Let f be a function defined on an interval J and suppose that there are positive 


constants A’ and r such that 
If(w)-— SY) < Kle—y|" forall x,y eI. 


Prove that f is uniformly continuous. 


. Is the function f(«) = sin 1/a continuous on (0,1)? Is it uniformly continuous 


on (0, 1)? Justify your answers. 
Is the function f(«) = asin 1/« uniformly continuous on (0,1)? Justify your 
answer. 


3.4. Uniform Convergence 


Uniform convergence is a subject that is both similar to and very different from 


uniform continuity. Uniform continuity is a condition on the continuity of a single 


function, while uniform convergence is a condition on the convergence of a sequence 


of functions. 


Sequences of Functions. In calculus we often encounter sequences of functions 


as opposed to sequences of numbers. They occur as partial sums of power series, 


for example. Other examples are the following (note that « is a variable): 


(l 


) {a/n}, c ER; 


(2) {e"},rER; 
(3) {ss} a>0; 


74 3. Continuous Functions 


(5) {sinner}, « € [0, 27). 


It is important to have methods to show that various things are preserved by 
passing to the limit of a sequence of functions. If the functions in the sequence are 
all continuous on a certain set D, is the limit continuous on D? Is the integral of 
the limit equal to the limit of the integrals if we are integrating over some interval 
on which all the functions are defined? The answer to both of these questions is 
“yes” provided the convergence is uniform. 


Uniform Convergence. Let {f,} be a sequence of functions on a set DC R. 
We say that {fn} converges pointwise to a function f on D if, for each « € D, the 
sequence of numbers { fn(«)} converges to the number f(x). If we write out what 
this means in terms of the definition of convergence of a sequence of numbers, we 
get the statement in (a) of the following definition. Statement (b) is the definition 
of uniform convergence. 


Definition 3.4.1. Let {fn} be a sequence of functions on a set D C R. Then: 


(a) {fn} is said to converge pointwise to a function f on D if, for each « € D and 
each € > 0, there is an N such that 


\F(«) = fa(x)| < © whenever n> N. 


(b) {fn} is said to converge uniformly on D to a function f if, for each € > 0, 
there is an N such that 


[f(w) — fr(w)|<€ whenever «€Dandn>N. 


As with continuity and uniform continuity, the definitions of pointwise conver- 
gence and uniform convergence seem identical until one studies them closely. In 
fact, they are very different. In the case of pointwise convergence, « is given along 
with € before N is chosen. Here N may well depend on both ¢ and «. In the case 
of uniform convergence, only ¢ is given initially; then an N must be chosen which 
works for all az. That is, N does not depend on «x in this case. 

Example 3.4.2. Give an example of a sequence of functions defined on [0, 1] which 
converges pointwise on [0,1] but not uniformly. 

Solution: An example is the sequence { f,} on 0, 1] defined by f(x) = 2", 
which is illustrated in Figure 3.4.1. This sequence of functions converges to the 
function f which is 0 if « < 1 and is 1 if « = 1. Since the sequence {fn(z)} 
converges to f(«) for each value of «, the sequence {fn} converges pointwise to f 

n (0, 1]. However, the convergence is not uniform on {0, 1]. In fact, 


\fn(t) — f(x)|=2" if x € (0,1), 
and so, given € > 0, in order for it to be true that | fn(«) — f(«)| < € for all x € (0, 1] 
and some n, we would need that 
a” <e forall x€ (0,1). 
However, since #” is continuous on [0, 1], this would imply that 1 = 1" < ¢ (Exercise 


3.2.12). Obviously, there are positive numbers € for which this is not true (any 
positive ¢ < 1). This shows that the convergence of {f,} on [0,1] is not uniform. 


3.4, Uniform Convergence 75 


Figure 3.4.1. The Sequence {2"} Does Not Converge Uniformly on [0, 1]. 


The problem in the above example is due to what is happening near « = 1. If 
we stay away from 1, the situation improves. 


Example 3.4.3. If0 <r < 1, prove that the sequence { fn}, defined by fa(«) = 2", 
converges uniformly to 0 on [0,7]. 


Solution: We have 
(3.4.1) |e" -Oj=a" <r" forall xe (0,r]. 
Now, given € > 0, we choose N so that 

r™ <e whenever n> WN. 
This is possible because r” > 0 if 0 <r < 1. Combining this with (3.4.1) yields 
|x" —O0|<e€ whenever 2 € [0,r] andn > N. 

This proves that {x”} converges uniformly to 0 on [@,r]. 

Uniform Convergence and Continuity. 


Theorem 3.4.4. Let {fn} be a sequence of functions, all of which are defined and 
continuous on a set D. If {fn} converges uniformly to a function f on D, then f 
is continuous on D. 


Proof. If a € D, we will show that f is continuous at a. Given € > 0, we first use 
the uniform convergence to choose an N such that. 


|fn( f(x)|<€/3 whenever x € D,n>N. 


76 3. Continuous Functions 


We then fix a natural number n > N and use the fact that each f, is continuous 
at a to choose a 6 > 0 such that 


\fn(w) — fn(a)| <€/3 whenever a € D and |x —a| < 6. 
On combining these and using the triangle inequality, we conclude that 
If (w) — F(a)| < If (@) = fn(x)| + |fn(x) — fn(a)l + [fn (a) — f(@)| 
</8+e/3+€/3=6, 


whenever « € D and |x — al < 6. This proves that f is continuous at a. Since a 
was an arbitrary point of D, f is continuous on D. a 


Example 3.4.5. Analyze the convergence of the sequence of functions { fn} defined 
on [®,co) by 


fale) = “ r 


Does the sequence converge pointwise? Does it converge uniformly? 
Solution: Since f,(@) = 1 for all n, the sequence {f,(.r)} converges to 1 at 
ax = 0. Since each f, can be rewritten as 
1/n 


fala) = I/nt+e 


and the denominator of this expression converges to «, the sequence { f,(«)} con- 
verges to 0 if x / 0. Thus, {f,()} converges pointwise to the function f on [@,00) 
defined by f(x) = 0 if a > 0 and (0) = 1. 

It follows from the previous theorem that the convergence is not uniform, be- 
cause f is not continuous on (8,00) although each of the functions f, is continuous 
on this interval. 


Tests for Uniform Convergence. A sequence {fn} converges uniformly to f 
on aset D if and only if {|f, —f|} converges uniformly to 0 on D. Thus, it is useful 
to have simple tests for when a sequence converges uniformly to 0. We will give 


two such tests. One gives conditions which guarantee that a sequence converges 
uniformly to 0 and the other gives a condition, which if not true, guarantees that a 
sequence does not converge uniformly to 0. Both theorems have very simple proofs 
which are left to the exercises. 


The following theorem is useful for showing that a sequence converges uni- 
formly. 


Theorem 3.4.6. Let {fn} be a sequence of functions defined on a set D. If there 
is @ sequence of numbers bn, such that by —+ 0 and 

|fa(x)|<bn forall x€D, 
then {fn} converges uniformly to 0 on D. 


The following theorem provides a useful test for proving a sequence does not 
converge uniformly. 


Theorem 3.4.7. Let {fn} be a sequence of functions defined on a set D. If {fn} 
converges uniformly to 0 on D, then {fn(an)} converges to 0 for every sequence 
{xn} of points of D. 


3.4, Uniform Convergence 77 


Example 3.4.8. If fi() = an prove that {fn} converges uniformly to 1 on 
zn 


the interval [@,r] for each positive number r but does not converge uniformly on 
[@, 00). 


Solution: We have 
© 


ore 
c+n n 


|fn(e) — 1 = 


r]. Since r/n + 0, Theorem 3.4.6 implies that converges uniformly 


and, hence, that {fn} converges uniformly to 1 on [@. 


On the other hand, if we set an =n, then {xn} is a sequence of numbers in 
[@,00) and fn(an) = 1/2. Since fn(an) — 1 does not converge to 0, Theorem 3.4.7 
implies that { fn — 1} does not converge uniformly to 0 on [®,00) and, hence, that 
{fn} does not converge uniformly to 1 on [@, 00). 


Uniformly Cauchy Sequences. 


Definition 3.4.9. A sequence of functions { f,} on a set D is said to be uniformly 
Cauchy on D if for each € > 0, there is an N such that 


[fn(e) — fin(w)| << whenever 2 €Dandn,m>N. 


If {fa} is a uniformly Cauchy sequence, then {f,,(«)} is a Cauchy sequence for 
each x € D. By Theorem 2.5.8, { fn(«)} converges. Thus, { fn} converges pointwise 
to some function f on D. The next theorem tells us that the convergence is uniform. 
Its proof is left to the exercises. 


Theorem 3.4.10. A sequence of functions {fn} on D is uniformly convergent on 
D if and only if it 4s uniformly Cauchy on D. 


[ey 
Exercise Set 3.4 


1. Prove that the sequence {a/n} converges uniformly to 0 on each bounded in- 
terval but does not converge uniformly on R. 


2. Prove that the sequence ——— converges uniformly to 0 on R, 
zen 


3. Prove that the sequence {sin 


/n)} converges to 0 pointwise on R but it does 


not converge uniformly on R. 


converges uniformly to 0 on 


sinn. 
4. Prove that the sequence 
n 


if 

5. Prove that {"(1—.x)} converges uniformly to 0 on [@, 1]. Hint: Find where 
each of these functions has its maximum on [@, 1]. 

6. Prove Theorem 3.4.6. 

7. Prove Theorem 3.4.7. 


8. Prove that if {f,} is a sequence of uniformly continuous functions on a set _D 
and if this sequence converges uniformly to f on D, then f is also uniformly 
continuous. 


78 3. Continuous Functions 
a 
9. For « € (1,1) set sn(z) = So a*. This is the nth partial sum of a geometric 
k=0 
y—gntl 
series. Prove that sn(2) = ——— 
= 
10. Prove that the sequence {s,} of the previous exercise converges uniformly to 
on each interval of the form [—r,r] with r < 1 but it does not converge 
z 
uniformly on (—1, 1). 
11. Prove Theorem 3.4.10. Hint: Use an argument like the one in the proof of 
Theorem 2.5.8. 
12. Prove that if {ex} is a bounded sequence of numbers and a sequence {sn} is 


defined on (—1, 1) by 


" 
sn(t) = Do anc*, 
ke 
then {s,} converges to a continuous function on (—1,1). Hint: Prove this 
sequence is uniformly Cauchy on each interval [—r,r] for 0 <r <1. 


ST 
Chapter 4 


The Derivative 


In this chapter we will prove the standard theorems from calculus concerning dif- 
ferentiation — theorems such as the Chain Rule, the Mean Value Theorem, and 
L’H6pital’s Rule. 


We begin with the concept of the limit of a function. 


4.1. Limits of Functions 


Definition 4.1.1. Let J be an open interval, a a point of I, and f a function 
defined on I except possibly at a itself. Then we will say the limit of f(x) as a 
approaches a is L and write 

lim f(a) = L 

ase 
if, for each ¢ > 0, there is a 6 > 0 such that 


\f(w) —L|<e whenever x € and 0 < |x—al <6. 


Note that the condition 0 < |x —a| in the above definition means that, in 
defining the limit of f as « approaches 
of other than a itself. 


, we only care about values of f at points 


Note also that the domain of f may be larger than J and may not be an interval 
at all, but, in erder to define the limit of f at a, we want f to be defined at least 
at all points, except a itself, in some open interval containing a. 


Remark 4.1.2. On comparing the above definition with the definition of continuity 
(Definition 3.1.1), we conclude that, if f is defined on an open interval containing 
a, then f is continuous at a if and only if lim, 4. f(x) = f(a). 

This means that if f is not continuous at a (er not defined at a) but it has a 
limit ZL as & approaches a, then we can make f continuous at a by redefining (or 
defining) it at a by setting f(a) = L. 


eal 


Example 4.1.3. Find lim f(e) if f(e) is the function on R\ {1}. 
pas a= 


8e 4. The Derivative 


Solution: For x € R \ {1}, we have 
v1 


=e?teatl. 
z—1 


f(#) = 


The function on the right is continuous at 1 (since it is a polynomial) and has 
the value 3 there. Thus, if we extend f to all of R by giving it the value 3 at 
ax = 1, then it becomes the continuous function 2? +. + 1. By the above remark, 
limg 1 f(x) = 3. 
_ sing F 
Example 4.1.4. Can the function “"* on R\ {0} be defined at @ in such a way 
2 

that it becomes continuous at 0? 

sina 


Solution: We learned in calculus that. lim = 1. Thus, if : 


zs 


is given 


the value | at x = @, it will be continuous there. 


One-sided Limits, Limits at +00. 


Example 4.1.5. Give an intuitive discussion of the behavior of the function f(z) = 
x/|e| as x approaches @. 

Solution: We have f(x) = 1 if x > @ and f(x) = -1 if <@ Thus, as x 
approaches @, f(«) approaches 1 if we keep x to the right of @, while f(«c) approaches 
=1 if we keep « to the left of @ However, limy_,o f(x) does not exist, since in the 
definition of limit, we allow x to be on either side of a. 


The above example suggests that it may be useful to define one-sided limits 
that depend only on the behavior of the function on one side of the point «. Ifa 
function is defined on an unbounded interval, then it may also be useful to discuss 
its limit at +00 or oo. Correctly formulated, the same definition can be used to 
cover the cases of one-sided limits and of limits at too. 


Definition 4.1.6. Let f be a function defined on an open interval (a,b), where « 
could be —o0 and b could be +00. We say that the limit from the right of f(z) as 
a approaches a is L and write 


lim, f(e) = L 


esa 
if for every € > @ there is an m € (a,b) such that 


|f(@)-—L|<e whenever a<ax<m. 


Similarly, we say the limit of f(.«) as x approaches b from the left is L and write 
lim f(c)=L 
rb 
if for every € > @ there is an m € (a,b) such that 
|f(a)-—L|<e whenever m<a<b. 
Note that, if @ is finite, then to say that there is an m € (a,b) such that 
|f(a) — L| < € whenever a < x < m is the same thing as saying there is a d > @ 


such that |f(«) — L| < € whenever |x —a| < 6 and x € (a,b) (this is clear if we 
let m and 6 determine each other by the formula 6 = m — a). This is just like the 


4.1, Limits of Functions 81 


ordinary definition of limit of f at a except @ is restricted to lie to the right of a. 
A similar analysis holds for the limit from the left at b in the case where b is finite. 


In the case where b = 00, the condition m < « < b just means that m < 2, 
while in the case where @ = —00, the condition a < x < m just means that x <m. 
Stated this way, the above definition is the traditional definition of limit at oo or 
at —oo. 

For limits at 00 or —00, we will simply write “lim f(x)” or “ lim f(x)” rather 

eee eae 
than “ lim f(x)” or © lim f(x)”. 
zr 007 z—+—cot 

In view of the above discussion, the following theorem is almost obvious. Its 

proof is left to the exercises. 


Theorem 4.1.7. Let I be an open interval and let a be a point of I. If f is defined 
on I except possibly at a, then 


lim f(x) =L if and only if lim. f(z) =L = lim f(x). 
re ret re 


In other words the limit of f (cc) as « approaches a exists if and only if the limits 
from the left and the right both exist and are equal. Of course, the limit is then 
this common value of the limits from the left and right. 


Example 4.1.8. For the function 


no= fo if «<0, 


sinc if 2>0, 
find lim, 4e- f(«), lim, e+ f(x), and lim, y¢ f(«) if they exist. 

Solution: Since, to the left of 0, f agrees with the continuous function 1 — 2, 
its limit from the left is lim, (1 —) = 1. On the other hand, to the right of 
0, f agrees with the continuous function sina, and so its limit from the right is 
lim, se sin = sin0 = 0. Because the limits from the left and the right are not the 
same, lim, ye f(«) does not exist. 


2 


Example 4.1.9. Find jim 71°" +" 
2S pa 


Solution: We do this just as we would if we were finding the limit of a sequence 
as n > 00. We divide both numerator and denominator by the highest power of « 
that occurs. This yields 


xe? +3e4+1  143/e+41/x? 
~ 2 = 4/x? 
From this, we guess that the limit is 1/2. If we want to prove this is true, using 
only the above definition, we proceed as follows: 
x?4+3ct+1 1 3x43 

2a? — 4 2 2a? — 4|" 
Now if x > 3, then 2c? — 4 > 2? and 3c +3 < 4. In this case, it follows from the 
above that 


x? +3c41 
202-4 


82 4. The Derivative 


Thus, given € > @, if we choose m = max(3,4/e), then 
w4+3e4+1 1 
2a? —4 2 


This proves that the limit is 1/2, as we expected. 


4 
—<e whenever m<za. 
x 


Of course, once we prove some theorems about limits, it becomes much easier to 
do limit problems like the one above. It turns out that all the theorems about limits 
of sequences, proved in the last chapter, have analogues for limits of functions. 


Limit Theorems. As was the case with continuity, the limit of a function can 
be characterized in terms of limits of sequences. The following theorem is just like 
Theorem 3.1.6 and is proved the same way. The only difference is that L replaces 
f(a). We will not repeat the proof. 

Theorem 4.1.10. Let (a,b) be a (possibly infinite) interval and let u be a* or b~ 
or a point in the interval (a,b). If f is a function, defined on (a,b), then 
lim f(x) = L 


rou 


if and only if f(an) + L whenever {an} is a sequence of points in (a,b), distinct 
from u, with an > u. 


‘As was the case with continuity in Section 3.1, this theorem means that each 
theorem about convergence of sequences yields a theorem about limits of functions. 
For example, the Main Limit Theorem for sequences, together with the previous 
theorem implies the Main Limit Theorem for functions: 


Theorem 4.1.11 (Main Limit Theorem). Let (a,b) be a (possibly infinite) 
interval, let u = a* orb~ or a point in the interval (a,b), and let c be a constant. Let 
f and g be functions defined on (a,b). If limy su f(x) = BK and limg_,u g(x) = L, 
then 

(a) limy_,we = ¢ 

(b) limy yy f(x) = eK; 

(c) lim, ,u(f(e) + gle)) = K +L; 
(4) tims su f(«)g(#) = KL; 

(e) lima f(x)/g(x) = K/L, provided L 7 @. 


There is also a theorem about the limit of a composite function which is similar 
to Theorem 3.1.11 and has the same proof. 


Theorem 4.1.12. Let (a,b) be a (possibly infinite) interval and let u = at or b-. 
Ifg is defined on (a,b) and limz_sy9() = L, f is defined on an interval containing 
L and the image of 9, and f is continuous at L, then 


dim, f(g(«)) = f(D). 


Proof. Let {an} be a sequence in I converging to u. Then, by Theorem 4.1.10, 
limy4ug(x) = L implies g(an) + L. Then, by Theorem 3.1.6, the continuity of f 
at L implies that f(g(an)) + f(L). Again using Theorem 4.1.10, we conclude that 


limr ou f(g(«)) = F(Z). > 


4.1, Limits of Functions 83 


Example 4.1.13. Prove that if g is a non-negative function, defined on an interval 
I except possibly at one point a € I and if limy._ g(r) = L, then 


lim g’(x) = L" for all rational _r>@. 
ra 


Solution: If r > 0 is rational and we set f(x) = 2", then f is continuous on 
0,00) by Theorem 3.1.7. Since g"(«) = f(g(a)), it follows immediately from the 
previous theorem that lim, .qg"(«) = L". 


Infinite Limits. Just as with sequences, for a function f it is sometimes useful 
to know that, even though f may not have a finite limit as « — u, it does approach 
either +00 or oo. In analogy with Definition 2.4.4, we define infinite limits as 
follows. 


Definition 4.1.14. If f is a function defined on an interval (a,b), then we say 
limy sa+ f(x) = 00 if, for each M, there is an m € (a,b) such that 
f(a) >M_ whenever a<ax<m. 


Infinite limits at b~ and what it means for the limit to be —oo are defined analo- 
gously (see the exercises). 


If ¢ € (a,b) and lim, 4. f(x) and lim, 4,+ f(z) are both oo, then we write 
lim, se f(x) = 00. The analogous statement holds if the limits are both —oo. 


The following theorem reduces statements about infinite limits to statements 
about finite limits. Its proof is left to the exercises. 


Theorem 4.1.15. Let f be defined on (a,b) and let u= at or b~ or a point in 
the interval (a,b). If f is positive on (a,b), then 


lim f(e) co. Hand onlyaf lin 6: 
zou fC 


ru 


Similarly, if f is negative on (a,b), then 


li'y (oy itoce a ond only is) ime 
ee ru F(@) 
Example 4.1.16. Analyze the behavior of f(a) = — as x approaches 1. 
=e 
i 


Solution: We have lim = lim 
zl f(x) rl 


function from the left and the right at 1 are both @ On (0,1) the function f is 
positive and so lim,_,;— f(a) = co by the previous theorem. On (1,00) the function 
f is negative and so lim,_,;+ f(x) = —oo, also by the previous theorem. 


= @, and so the limits of this 


— ET 
Exercise Set 4.1 


In each of the next six exercises find the indicated limit and prove that your 
answer is correct. 


re) 
ca eee elt 
1. lim : 
rol £ 


84 4. The Derivative 


x 
2. li 
roe @—1 


4 3/2 
si tm, (Fz) ; 
2 


ao 


8. If f(a) =sin1/x,do lim f(x) and lim f(x) exist? 
0+ z30- 


9. If, in Example 4.1.8, f is defined to be —a for x < @ instead of 1 — a, does 
lim, f(x) exist? Why 
a 


10. Prove Theorem 4.1.7. 


11. Let f be defined on a bounded interval (a,b) and let uw be at, b~, or a point 
of (a,b). Prove that if limy_,u f(z) exists and is positive, then there is a 5 > @ 
such that f(x) > @ whenever |x — ul < 6 and x € (a,b). Hint: Recall the proof 
of Theorem 2.2.3. 


12. Let f be a non-negative function on an interval (a,b) and let w= at or b>. If 
lime ,y f(x) exists, prove that it is a non-negative number. 


13. Let a and b be extended real numbers with a < b. Prove that if f is a bounded, 
monotone function on the interval (a,b), then lim f(x) and lim f(x) both 
mee pas 


exist and are finite. 
14. Give an appropriate definition for the statement lim,_,,— f(a) = —0o. 


15. Prove Theorem 4.1.15 


4.2. The Derivative 


The definition of the derivative is familiar from calculus. 


Definition 4.2.1. Let f be a function defined on an open interval containing a € R. 


If 
him £021) 
a —————— 
ae ga 
exists and is finite, then we denote it by //(a), and we say f is differentiable at a 
with derivative f’(a). If f is defined and differentiable at every point of an open 
interval I, then we say that f is differentiable on I. 


The derivative f’ of f is a new function with domain consisting of those points 
in the domain of f at which f is differentiable. 


4.2. The Derivative 85 


Remark 4.2.2. When convenient, we will make the change of variables h = 2 —a 
and write the derivative in the form 


slat h) = f(a) 


4.2.1 (a) = li 
ey PO) 
Equivalently, when it is convenient to use x for the independent variable in the 


, we will write the derivative in the form 


f(x) = lim Seth) ~ f(a) 


h0 h 


function f 


We don’t intend to repeat the computation of the derivatives of all the ele- 
mentary functions. This is done in calculus. We will assume the student knows 
how to differentiate polynomials, rational functions, trigonometric functions, in- 
verse trigonometric functions, and exponentials and logarithms. We will, however, 
compute a couple of derivatives directly from the above definition, just to remind 
the student of how this is done, and we will occasionally compute a derivative, as 
an example, to illustrate the use of some theorem. 


Example 4.2.3. If f(x) = x, find the derivative of f using just Definition 4.2.1. 
Solution: We have 
3.4413 es 2 ain 
l(a) = lim Pi Saas (x — a)(a* + xa + a*) 


ra r—-a ra r-a 


= lim (x? + wa + a?) = 3a”. 
ra 
Thus, f/(a) = 302. 


Example 4.2.4. If f(x) = \/z, find f’(«) for x > @ using just Definition 4.2.1. 
Solution: We have 


Vath—- Vr ii ath-« 
im 
0 h ho0 h( oth + V2) 
a 1 1 
= lim ——W— = ~~. 
a0 J/rth+ fe 2We 
1 
WE 


Differentiation Theorems. We will use what we know about limits to prove the 
main theorems concerning differentiation. Some of these are proved in the typical 


Thus, f’(x) = 


calculus course and some are not. 
Theorem 4.2.5. If f is differentiable ata, then f is continuous at a. 
Proof. If f is defined in an open interval containing a and « and if x / a, then 
f(z) = f(a) 
= + —a). 

Fle) = fa) + Ee ~a) 
We take the limit of both sides as x — a. If f is differentiable at a, then 
him LOL) 


ra ©—a 


limy sa f(x) = f(a). Thus, f is continuous at a. a 


exists and is finite. Since lim, .a(a — a) = @, this implies that 


86 4. The Derivative 


Theorem 4.2.6. Let f and g be functions defined on an open interval I containing 
a and suppose f and g are both differentiable ala and c is a constant. Then cf, 
f +9, fg are differentiable at a, as is f/g provided g(a) / @, and 


(a) (cf)(a) = f(a); 
(b) (fF +.9)'(a) = f(a) +9'(a); 
(c) (f9)'(a) = f'(a)g(a) + f(a)g!(a); 


(@ (2) (= Herald Leo 


Proof. We will prove (c) and (d) and leave (a) and (b) to the exercises. 
To prove (c), we write 


(422) Lleale) —Hleole) _ 1E)= 1) 0), 1g) 82I= WO), 


ra ra =a 


By the previous theorem, lim, sq 9(r) = g(a), and so the Main Limit Theorem 
implies that the limit of the right side of (4.2.2) as « 3 a exists and is equal to 
f'(a)g(a) + F(a)g'(a). Thus, the limit of the left side of this equality as « + a 
exists as well. Hence, (fg)'(a) exists and is equal to f"(a)g(a) + f(a)g"(a). 

To prove part (d), we first prove that 1/g is differentiable at a and 


g(a) 


In fact 

Ig(a) _ _9(a)~9(z)__ g(a) g(z) 1 

a ~ g(a)g(a)(« — a) t—a_— g(a)g(x) 

If we take the limit of both sides and use the Main Limit Theorem, the conclusion 


i 
is that (1/g)/(a) exists and is equal to 54. as claimed. 
gla 


Now part (d) of the theorem follows from the computation 


ee pte wt) 
(2) @= (42) @= reg - 1056 
_ Haale) ~ flaog'(a), 
a) 


The Chain Rule. 


Theorem 4.2.7. Suppose g is defined in an open interval I containing a and f 
is defined in an open interval containing g(1). If g is differentiable at a and f is 
differentiable at g(a), then f og is differentiable ata and 


(fog)'(a) = f'(9(a))9'(a). 
Proof. We let 6 = g(a) and we define a function h by 


DS ALN Fo 


J'(y) if y=b. 


4.2. The Derivative 87 


Then, since 
jp, 10)= $00) 
im —————. 
as es Joes 
the function h is continuous at b= g(a). Furthermore, 
) 


Hol2) = FOO) — peggy) 29), 


ra r-a 


= f'(), 


Since h is continuous at b = g(a) and g is continuous at a, we conclude that h(g(x)) 
is continuous at « = a. Thus, if we take the limit of both sides of the above identity, 
we conclude that 


(fog)'(a) = lim F(9(@)) = F(9@) _ 


g(x) ~ =e 


h(g(a)) lim = f'(g(a))g'(a) 


ra 
a 
Example 4.2.8. Find (sin /)! using the Chain Rule. 
1 
Solution: The derivative of sin is cos and the derivative of Y# is 5 Je Thus, 
a 


by the Chain Rule, 


cos Vt 
sin Vr, (cos = : 
(sin Var)! = (cos Va) 5 aa ls 
Derivative of an Inverse Function. If f is continuous and strictly monotone 
on an interval J, then it has a continuous inverse function g, defined on J = f(1), 
such that g(J) = J (Theorem 3.2.6). If J is an open interval and a is a point of J, 
then J is also an open interval and b = f(a) € J (Exercise 4.2.5). 


Theorem 4.2.9. If f is continuous and strictly monotone on an open interval I 
containing a, f is differentiable at a, and f'(a) 7 @, then the inverse function g of 
f is differentiable at b = f(a) and 


Proof. For y € J, we set « = g(y) € I. Then f(x) = y. We also have b = f(a) 
and a = g(b). Then 
gly) —9(b) _ ae 4 
yb f(x) = f(a) 


If we denote by h the function of x on the right, then, since f is strictly monotone 


on I, h is defined everywhere on I except at x = a. Since lim h(x) = Fe the 
ore 
1 
function h will be defined and continuous at a if we give it the value Fm tere 
@ 
Then 
gly) — 9(b) 
= 9) _ 1 ((y). 
y—b 


If we pass to the limit as y + b, then, by Theorem 4.1.12, the expression on the 


right has limit h(g(b)) = 


, since g is continuous at b. This implies the 


1 
F'(g(b)) 


88 4. The Derivative 


expression on the left has the same limit, which means that g’(b) exists and equals 


Fay : 


Example 4.2.10. Find the derivative of sin” }(). 
Solutio: 


The function sin, when restricted to the domain [—1/2, 7/2] is 
strictly increasing. Its inverse function sin”! (.r) is also increasing and has domain 
[-1, 1] ~ the image of [-7/2,2/2] under sin. Thus, sin”! has a non-negative deriv- 
ative on (—1,1) and by Theorem 4.2.9, it is given by 
1 1 


cos(sin 


1—sin?(sin 


since sin(si 


iT i 
Exercise Set 4.2 


1. Using just the definition of the derivative, show that the derivative of 1/2 is 
-1/2?. 
2. Using just the definition of the derivative, find (x? + 3c)’. 


3. Show how to derive the expression for the derivative of tan if you know the 
derivatives of sinz and cos x. 


4. Using theorems from this section, find the derivative of tan ( 3 i) 
a 4 


5. We know that the image of a closed interval under a continuous function is a 
closed interval er a point (Theorem 3.2.4). Show that the image of an open 
interval under a continuous, strictly monotone function is an open interval. 

6. If fogoh(x) = f(g((x))) is the composition of three functions, find an 

expression for its derivative. You may use the Chain Rule. 


7. Using Theorem 4.2.9, derive the expression for the derivative of Vx. 


8. Using Theorem 4.2.9, derive the expression for the derivative of tan 


9. Prove that if f is defined on an open interval J and has a positive derivative at 
a point a € J, then there is an open interval J, containing a and contained in 
I, such that f(x) < f(a) < f(y) whenever x,y € J and «<a < y. Hint: See 
Exercise 4.1.11. 


10. If f is a monotone function on an interval and g is its inverse function, then 
fogy)=y 
for every y in the domain J of g. Use the Chain Rule on this identity to derive 


the expression for the derivative of the inverse function g. This argument is 
not a substitute for the proof in Theorem 4.2.9. Why? 


11. Is the function defined by 


| fasini/a if x £0, 
F(z) ‘i if «-0 


4.3. The Mean Value Theorem 89 


differentiable at @? How about the function 


| farsini/e if x £0, 
f(z) [ if 2=0? 


12. Is the function defined by 


differentiable at @? 


4.3. The Mean Value Theorem 


Critical Points. The proof of the Mean Value Theorem rests on the fact that a 
continuous function on a closed bounded interval [a,b] takes on its maximum and 
minimum values only at critical points. A critical point for f on [a,b] is a point 
c € [a,}] which satisfies one of the following: 

(1) cis an endpoint (a er b); 
(2) cis a stationary point, meaning c € (a,b) and f‘(c) = @; or 
(3) ¢ is a singular point, meaning c € (a,b) and f'(c) does not exist. 
Theorem 4.3.1. If f is a continuous function on a closed bounded interval [a,b] 


and c € [a,}] is a point at which f assumes a maximum or a minumum value on 
[a,b], then c is a critical point for f on [a,b]. 


Proof. Assume f has a maximum at c. The proof in the case where it has a 
minimum is the same, except that the inequalities reverse. 


We will prove that if ¢ is not an endpoint or a singular point, then it must be 
a stationary point. This implies that it has to be one of the three. 

If c is not an endpoint and not a singular point, then a < ¢ <b and f has a 
derivative at c. Since f(«) < f(c) for all x € [a,b], we have 


f(«)—f(c) J<O0 for «>, 
w—e  \>0 for r<e. 
It follows from Exercise 4.1.12 that 


tim LO=LO 69 and tim LOO 


asct £—C rsee EE 


.>0. 


Since these two one-sided limits must be equal if the limit itself exists, we conclude 
that the limit must be 0. That is, //(c) = 0. Hence c is a stationary point. a 


The Mean Value Theorem. The Mean Value Theorem is one of the most heav- 
ily used tools of calculus. It says that if f is continuous on [a,b] and differentiable 
on (a,b), then for at least one point between a and b the graph of f has tangent line 
parallel to the line joining (a, f(a)) to (b, f(b)); this may happen at several points 


90 4. The Derivative 


Figure 4.3.1. Three Cheices fer the ¢ in the Mean Value Theerem. 


(see Figure 4.3.1). More precisely, 


Theorem 4.3.2. If a function f is continuous on the closed interval [a,b] and 
differentiable on the open interval (a,b), then there is at least one point c € (a,b) 
such that 


43.) AC 


b 


Proof. The function whose graph is the line joining (a, f(a) to (b, f(b)) is 
£(b) = f(@) 
b—a 


If we subtract this from f, the result is the function s, where 


af f0) = Fa) 


= f(@) - f(a) - “S— (ea). 
The function s is also continuous on [a,b] and differentiable on (a,b). By Theorem 


b-a 
3.2.1, s assumes both a maximum value and a minimum value on [a,b]. However, 


s(a) = s(b) =0, 


a(a) = f(a) + (ea). 


and so s is either identically zero er it assumes a non-zero maximum or a non-zero 
minimum on (a,b). In each of these cases, s has a critical point in (a,b). Let ¢ be 


such a critical point. Since s is differentiable on (a,b), ¢ must be a point at which 
s' is 0. Thus, 


fb) = f(a) 


b-a 


s(e) = f'(o)— 


which implies that c satisfies (4.3.1). a 


The Mean Value Theorem has a wide variety of applications. Many of the 
frequently used facts that we take for granted in calculus are direct consequences 
of this theorem. It is also used to prove many new facts that go beyond standard 
calculus material. 


4.3. The Mean Value Theorem 91 


Functions with Vanishing Derivative. 


Theorem 4.3.3. If f is a differentiable function on an open interval (a,b) and f! 
is identically 0 on (a,b), then f is a constant. 


Proof. Let ,y be any two points of (a,b) with x < y. Then the Mean Value 
Theorem implies that there is a number ¢ between x and y such that 


= x 
fo = 1 I ) 
y-« 
Since f/(c) = 0, this implies that f(r) — f(y) = 0, or f(x) = f(y). Thus, f has the 
same value at any two points of (a,b) and this means that it is constant. a 


Corollary 4.3.4. If f and g are differentiable functions on (a,b) and f'(x) = g'(x) 
for all x € (a,b), then there is a constant c such that f(«) = g(x) + ¢ on (a,b). 


Proof. We apply the previous theorem to f — g. a 


if a function h has an antiderivative on 


Another way to say this corollary 
(a,b), then any two of its antiderivatives differ by a constant. We use this fact all 
the time in integration theory. 


Monotone Functions. 


Theorem 4.3.5. If f is a function which is continuous on a closed interval [a, b] 
and differentiable on the open interval (a,b), then f is increasing on (a, b| if f’(x) > 
0 for all x € (a,b), while f is decreasing on (a, b] if f/(x) <0 for all x € (a,b). 


Proof. If « and y are any two points of [a,b] with « < y, then the Mean Value 
Theorem tells us there is a ¢ € («,y) C (a,b) at which 


) f= $a), 


y-@ 


fe 


Since the denominator is positive, this means that f/(c) and f(y) — f(«) have the 
same sign. This implies that f is strictly increasing (resp. decreasing) on [a, b] if 
f'(c) is positive (resp. negative) for all ¢ € (a,b). 


This is the basis for the familiar graphing technique which uses the sign of the 
derivative of f to determine intervals on which f is increasing er decreasing. 


The converse of Theorem 4.3.5 is not true, since a function which is increasing 
on an interval (a,b) can have a derivative that is 0 at some points of (a,b) (for 
example, f(c) = 2? is increasing on (—oo, +00), but its derivative is 0 at 0). How- 
ever, the closely related theorem which comes next is an “if and only if” theorem. 
Its proof is left to the exercises. 


‘A function on an interval I is said to be non-decreasing (non-increasing) on I 
if f(x) < f(y) (F(a) > f(y)) whenever a < y and ay € I. 


Theorem 4.3.6. Let f be a continuous function on [a,b] which is differentiable on 
(a,b). Then f is non-decreasing on [a,b] if and only if f!(«) > 0 for all x € (a,b), 
while f is non-increasing on [a,b] if and only if f'(x) < 0 for all x € (a,b). 


92 4. The Derivative 


Example 4.3.7. Find the intervals on which the function f(«) = «3 — 3¢ +5 is 
increasing, decreasing. 

Solution: The derivative of f is f'() = 32? —3 = 3(@—1)(e+ 1). This 
function is positive for « > 1 and x < —1 and is negative for —1 < x < 1. Thus, 
by Theorem 4.3.5, f is increasing on (—co, —1] and [1, +00) and it is decreasing on 


f-1, 1). 


Example 4.3.8. Prove that sinx < x for all x > @. 

Solution: Let f(x) = «—sinz. Then f(®) = 0 and f'(«) = 1—cos@ > @ for all 
x. In fact, f'(«) > @ except at multiples of 27. By Theorem 4.3.5, f is increasing 
on [0,2n]. Since it is 0 at « = 0, it must be positive on (@, 27]. Thus, sina <x for 
x € (0,27]. It is obvious that sinx < a for « > 2m (since sina < 1 for all x). 


Uniform Continuity. We know that a continuous function on a closed, bounded 
interval J is uniformly continuous. If the interval J is not closed or not bounded, 
then continuous functions on J need not be uniformly continuous. However, we 
have the following application of the Mean Value Theorem: 


Theorem 4.3.9. If f is a differentiable function on a (possibly infinite) open in- 


terval (a,b) and if f! is bounded on (a,b), then f is uniformly continuous on (a,b). 


Proof. Let M be an upper bound for |f"| on (a,b). Then |f/(x)| < M for all 
x € (a,b). By the Mean Value Theorem, if x,y € (a,b), then there is a c between 
x and y such that 


my 
If we take the absolute value of both sides and multiply by |« — yl, this yields 
If(x)- FW) = IF (lle — 9] S$ Mle — yl. 


Thus, given € > @, if we choose 6 = ¢/M, then 


[f(@)— Fy)l<€ whenever |x —y| <6. 


This proves that f is uniformly continuous on (a,b). a 


eid 
Exercise Set 4.3 


1. If f is a continuous function on [-1, 1] which is differentiable on (-1,1) and 
satisfies f(-1) = @, f(0) = 0, and f(1) = 1, then show that f” takes on the 
values @, 1/2, and 1 on [-1, 1}. 

2. Prove that |sina —sin yl < | —y| for all a,y ER. 

y 


3. If > @, prove that ny —Ina < fr<acsy. 


= 

4, Suppose f is a continuous function on [0,00) which is differentiable on (@,00). 
If f(0) = @ and | f"(x)| < M for all x € (0,00), then prove that |f(x)| < Ma 
on [0,00). 


4.4, L’Hepital’s Rule 93 


ots 


13. 


1d. 


16. 


. Prove that if f is a differentiable function on (0,00) and f and f’ both have finite 


limits at 00, then lim, 5. f"(«) = 0. Hint: Apply the Mean Value Theorem to 
f for large values of a and b. 


. If f(x) = 2x3 + 3x? — 122 + 5, find the intervals on which f is increasing and 


those on which it is decreasing. 


. Prove that Ina < x —1 for all « > 0. Hint: Analyze where x — 1 — Inc is 


increasing and where it is decreasing. 


. Find where e~* a* is increasing and where it is decreasing. Which is bigger, e* 


or 7°? 


|. Prove Theorem 4.3.6. 
10. 


Suppose f is a differentiable function on an interval (a,b) and that f’ takes 
on both positive and negative values on (a,b). Prove that f’ must take on the 
value 0 as well. Hint: Show that if f’(x) > 0 and f’(y) < 0 for points x,y with 
a<«x<y <b, then the maximum of f on [z,y] occurs at some point strictly 
between x and y; the same argument will show that if f/(7) <0 and f’(y) > 0, 
then the minimum of f on [x,y] occurs at a point strictly between x and y. 


Use the result of the previous exercise to show that if f is differentiable on 
(a,b) and f’ takes on two values ¢ and d on (a,b), then it take on every value 
between c and d. This is the Intermediate Value Theorem for Derivatives. Note 
that we do not assume f” is continuous on [a,b]. 


. Let f be differentiable on R. Prove that if there is an r < 1 such that |f/(x)| <r 


for all « € R, then | f(x) — f(y)| < rle — y| for all 2,y € R. A function with 
this property is called a contraction mapping. 


Let f satisfy the conditions of the previous exercise. Show there is a fixed point 
for f ~ that is, an x € R such that f(a) =a. Hint: Construct a sequence {¢n} 
inductively by setting «1 = 0 and ¢q41 = f(an). Show that this sequence is 
Cauchy and that it converges to a fixed point for f. 


Prove that if f is increasing on [a,b] and on [b,c], then f is also increasing on 
[a, ¢]. 


. The following is a partial converse to Theorem 4.3.9: prove that if f is differen- 


tiable on a, possibly infinite, interval (a,b) and if lim f"(«) = 00, then f is not 
rb 


uniformly continuous on (a,b). The same conclusion holds if lim f"(«) = 00. 
zoe 


Show that Ina is uniformly continuous on [1,00) but not on (0, 1}. 


4.4, L’HG6pital’s Rule 


In this section we prove the familiar L’Hépital’s Rule — a tool from calculus, useful in 
calculating limits of indeterminate forms. It has two forms, depending on whether 
the indeterminate form is of type 0/0 or of type 00/00. The proof uses the following 
generalization of the Mean Value Theorem. 


94 4, The Derivative 


Cauchy’s Mean Value Theorem. 


Theorem 4.4.1. Let f and g be functions which are continuous on a closed, 
bounded interval [a,b] and differentiable on (a,b). Assume that g'(x) / 0 for all 
a € (a,b). Then there exists c € (a,b) such that 


aay iW) =F) _ £0) 


9(b)— g(a) gC)” 


Proof. We begin by observing that g is strictly monotone on [a,b]. This follows 
from the fact that g’ is never 0 on (a,8). If g’ is never 0, then it cannot take on 
hoth positive and negative values on (a,b) (Exercise 4.3.10). Thus, it is always 
positive er always negative, and this implies that g is strictly monotone on [a, 5]. 
In particular, g(b) 7 g(a). 

The proof now follows the same strategy as the proof of the erdinary Mean 
Value Theorem (Theorem 4.3.2). The only difference is that 2 — a and b— a are 
replaced by 9(z) — g(a) and (b) — g(a) in the definition of the function s. Thus, 
we set 
f(b) = F(a) 
g(b) = g(a) 


Note that s is continuous on [a, b] and differentiable on (a,b). By Theorem 3.2.1, s 
assumes both a maximum value and a minimum value on [a,b]. However, 


s(a) = s(b) = 0, 


s(x) = f(x) — f(a) - (9() = 9(@)). 


and so s is either identically zero er it assumes a non-zero maximum or a non-zero 
minimum on (a,b). In any of these cases, s has a critical point in (a,b). Let ¢ be 
such a critical point. Since s is differentiable on (a,b), ¢ must be a point at which 
s' is 0. Thus, 


s(o) = fe) - FOETIOKa =0, 


which implies that ¢ satisfies (4.4.1). a 


£() = f(a) 
— g(a) 


a2 
Example 4.4.2. Prove that |cosa — 1| < > for all x, 


Solution: We use Cauchy’s Mean Value Theorem with f(z) = cos x and g(x) = 
x. It implies that there is a c between 0 and « such that 


cosx—1 cos0— —sine 


— 02 2c 


z 
ey} 


me) 


whieh implies that [eos — 1] < =. 


4.4, L’Hepital’s Rule 95 


L’H6pital’s Rule. The problem of finding 


Ina 


cannot be attacked by using the part of the Main Limit Theorem which deals 
with limits of quotients, because the limit of the denominator is 0. In fact, both 
numerator and denominator have limit 0. A limit problem of this type is called a 
0/0 form. 


Similarly, the problem of finding 


cannot be attacked using the limit of quotients part of the Main Limit Theorem. 
This time the problem is that both numerator and denominator have limit +00. A 
limit problem of this type is called an 00/00 form. 


Problems of this type can often be solved by using the following theorem. 
Theorem 4.4.3 (L’H6pital’s Rule). Let f and g be differentiable functions on 
a (possibly infinite) interval (a,b) and let u stand for a* or b~. Suppose g(r) and 
g'(xz) are non-zero on all of (a,b) and 

(1) limy ou f(w) = 0 = lim, +4 9(2) or 

(2) Tim, ou f(@) = 00 = limes, g(2). 

Then 


(4.4.2) tim 2. = tim £@) 


au g(a) 220 g(a)’ 
provided the limit on the right exists. 


* and the limit on the 


Proof. We will present the proof in the case where u = a 
right in (4.4.2) is a finite number L. The case where this limit is infinite can be 
reduced to the finite case (Exercise 4.4.16). The proof in the case u = b~ is entirely 
analogous. 

If x,y € (a,b), then Cauchy’s Mean Value Theorem tells us that there is a c 


between « and y such that 
He) = Ha) = (ol) = aS. 


f(z) _ f@) , (1 a) © 
= a i 

g(x) g(x) a(x) J g'(c) 

On subtracting L and performing some algebra, this becomes 
: / 
f@)_,_ iM, (- aH) (Z () ) _ pg) 

On applying the triangle inequality, this yields 
Given € > 0, we will show how to make each of the terms on the right in this 


a a) ) oO g(x) 
; , 
f(z) LO], (+ gly) |\ | flo) iH Eas 
inequality be less than €/3 by choosing x sufficiently close to a. 


(4.4.3) 


g(x) g(x) 
9(z) g(x) g(z) ) gO g(x) 


96 4. The Derivative 


At this point the proof splits into two cases, depending on whether hypothesis 
(1) er (2) holds. If (1) holds, then since lim, .a+ f"(x)/9'(x) = L, Definition 4.1.6 
tells us there is an m € (a,b) so that 


FAC) 


(4.4.4) We) 


-1| </6 


whenever a < ¢ < m. This condition will be satisfied if « is any number with 
a <a < mand y is any number with a < y < « (since ¢ is between « and y). Now, 
given any <r, we can choose a y (depending on r) so that. a < y <r and 


6 


(4.4.5) eat <3 and 
(4.4.6) fal <min (1 am): 


This is possible because limy.,a+ f(y) = 0 = limy.,a+ 9(y) holds by hypothesis (1). 
Taken together, inequalities (4.4.3) through (4.4.6) imply that 


S(c) —L)}<e whenever a<a<m. 
g(a) 


f(z) 
9() 


= L and completes the proof in the case where (1) 


This implies that lim, 
goat 

holds. 

In the case where hypothesis (2) holds, the proof is almost the same. We still 
use (4.4.3), but the order in which r, y, and m are chosen changes and « and y 
reverse positions in the inverval (a,b). We first choose y such that (4.4.4) holds 
whenever a < ¢ < y. This is possible because lim,.,q+ f/(¢)/9!(c) = L. 

We then choose m € (a, y) in such a way that (4.4.5) and (4.4.6) hold whenever 
a <x <m. This is possible because lim,.,4+ 9(«) = 00 holds by hypothesis (2). 
Because m < y, such a choice of « will force x < y and, hence, ¢ < y (again, since 
cis between « and y). 


As before, inequalities (4.4.3) through (4.4.6) imply that 


aa 1| <e whenever a<z<m. 

g(x) 

This implies that lim aS = L and completes the proof in the case where (2) 
zat g(x 

holds. im 


ocegete ota 
Example 4.4.4, Find dim ae 


Solution: This is a 0/0 form since lim,.,; nz = 0 = lim,-.1(z? — 1). If we 
differentiate numerator and denominator and take the limit of the resulting fraction, 


4.4, L’Hepital’s Rule 97 


we get 


We conclude that 


as well. 


Example 4.4.5. Find lim —. 
ime 


Solution: This is an co/oo form since limz_,o, e” = 00 = limg-s00 a. If we 
differentiate numerator and denominator and take the limit of the resulting fraction, 
we get 

eee 
lim —. 
x08 et 
This is still an co/oo form. If we again differentiate numerator and denominator 
and pass to the limit, we get 


eae) 
lim — =0. 
2300 EF 


We conclude from L’Hepital’s Rule that 


_ Qe 
lim = =0 
2-300 et 
and, hence, that 
2 
_ 
lim = =0. 
300 et 


Example 4.4.6. Find limy-..,(1 + r/n)". 


Solution: This is the limit of a sequence. However, we may compute this limit 
by replacing the integer-valued variable n with the real-valued variable «x. If we find 
that lim,-..(1 + r/x)® has a limit, then limp soe(1-+7/n)" must have the same 
limit. 


We set f(a) = (1+ r/z)*, g(x) = n(f(x)) = eIn(1 + r/2), and y = 1/2. Then 


aim, 9(#) = lim o(1/y) = lim 


In(1 + ry) 
yeas 
y 
This is a 0/0 form and L’H6pital’s Rule implies that this limit is 


, r 
lim =r. 
yoeltry 


Then 
lim f(z) = lim e9) = e" 
400 a toe 


by Theorem 4.1.12. 


98 4. The Derivative 


TT 
Exercise Set 4.4 


a1 


1. Prove that ifr > 0 and x > 1, then na < . Hint: Use Cauchy’s form of 


zm 
the Mean Value Theorem with f(z) =Inz and g(x) = 


2. Prove that |sina — 2| < 


3. Prove that 1+ 2? <e?* for alle ER. 


4. If f is a function which is differentiable on an open interval J containing 0 and 
if f(0) = 0, then prove that there is a c between 0 and « at which 


file) 2" 


m1 


Fe) = 


ce n 


Hint: Apply the Cauchy Mean Value Theorem to f(z) and g(x) = 2”. 

5. Use the previous exercise and induction to prove that if f is n-times differen- 
tiable on an open interval J containing 0 and if the kth derivative, f, of f is 
0 at 0 for & = 0,1,...,n—1, then there is a c between 0 and « at which 


a” 


F(a) = fe) 


nt 


Find each of the following limits if they exist: 


Ine 
6. lim —— where r > 0. 
z-co zt 


7. lim zing. 
8 


14. Let f be a differentiable function on (0,00), Prove that if lim f(x) = oo and 
22408 
lim f!(«) = L, then 
aes 


f(z) 
lim — 


2k Frere an © 


4.4, L’Hepital’s Rule 99 


15. 


16. 


Let f be a differentiable function on an interval of the form (a, +00). Prove 
that if there is a number r / 0 such that lim, ,..(rf"(e) + f(a)) = L is finite, 
then lim, soe f(x) = 0 and lim, 4.5 f(#) = L. Hint: Apply L’Hépital’s Rule 
9 OT Fa) 
elt 

Finish the proof of Theorem 4.4.3 by showing that if the theorem is true when- 
ever lim, +u f’(x)/g9'(x) is finite, then it is also true whenever this limit is 
infinite. 


a 
Chapter 5 


The Integral 


In this chapter we define the Riemann integral and develop its most important prop- 
erties. We also prove the Fundamental Theorem of Calculus and discuss improper 
integrals. 


5.1. Definition of the Integral 


If [a,b] is a closed, bounded interval, then a partition P of [a,b] is a finite set of 
distinct points of [a,b] which contains a and b and is indexed in a way that is 
consistent with the order in which the points appear on the line. We denote this: 
by 


P= {a=% <2, <-++ <a, = dh. 


Such a set of points has the effect of dividing [a, b] into a collection of n subintervals 


(210,01), (1,09), ++; [tn —asn)- 


Given a partition P of [a,b], as above, and a bounded function f, defined on [a,b], 
a Riemann sum for f and P on [a,b] is a sum of the form 


(5.1.1) SO f(€x) (ae — ee) 
k=l 


where, for each k, 7, is some point in the interval (21,7). For each k, the term 
f (Tx) (ae —2x_1) represents the area (er minus the area, if f(Z,) < 0) of a rectangle 
with width x, — «41 and with height |f(x)| (see Figure 5.1.1). 

In calculus, the Riemann integral of f is defined as a limit of Riemann sums, 
although how this limit is defined and how one shows that it actually exists for a 
reasonable class of functions are things that are usually left for a more advanced 
course. This is that course. 


101 


102 5. The Integral 


fa) 


Figure 5.1.1. A Riemann Sum. 


Here we will give a precise definition of the integral and prove that it exists for 
a large class of functions on [a,b]. In particular, we will prove that the integral of 
every continuous function on [a,b] exists. 


Upper and Lower Sums. Given a partition P = {a =a <2) <+++ <2, =} 
of [a,b] and a bounded function f on [a,b], we can write down two sums which have 
every Riemann sum for this partition and this function trapped in between them. 
These are the upper and lower sums for P and f: 


Definition 5.1.1. Given a partition P and function f, as above, for k =1,...,n, 
we set 


My, = sup{f(x): 2 € [ax—1,ve]} and mx = inf{ f(x): a € (te-1,rk]}. 


Then the upper sum for f and P is 


n 


(5.1.2) U(f,P) = So Me (te — te-1), 
k=1 
while the lower sum for f and P is 
i 
(5.1.3) L(f,P) = So mg (x = 24-1). 
k=1 


Now, for any choice of tq € [z%—1, 24], we have 
mr < f(x) < Mr. 


This inequality remains true if we multiply by the positive number (a, — t%—1)- 
On summing the resulting inequalities, we conclude that 


(5.1.4) L(f,P) SSO f (@e) (ae - te-1) $ US, P) 
k=1 


Thus, the upper sum U(f, P) is an upper bound for all Riemann sums for f and 
P and the lower sum is a lower bound for all these sums. In fact, it is not hard 
to prove that U(f, P) is the least upper bound for all Riemann sums for f and P, 
while L(f, P) is the greatest lower bound of this set (Exercise 5.1.6). 


5.1. Definition of the Integral 103 


Example 5.1.2. Find the upper sum and lower sum for the function f(a) = «? 
and the partition P = {0 < 1/4 < 1/2 < 3/4 < 1} of the interval (0, 1]. 

Solution: The function f is increasing on [0,1] and so its sup on each subin- 
terval is achieved at the right endpoint of the interval and its inf is achieved at the 
left endpoint. Thus, 


dL site ae lea Bk vi ee 9 3 
uP) =0(5-0) +5 (5-3) +5 (4-3) tm Cna)- 


while 


Lf Eh 29 (Bed 3) 1 
epee (Eso ee (ted \ce (222 _3)_ 18 
vnP)= 5 (3 4G tw a) ttt i) 


Refinement of Partitions. Because partitions of [a,b] are simply finite subsets 
of [a,b] that contain a and b, we may use set-theoretic relations and operations such 
as “C” and “U” on them. 


S|~ 


a 


Definition 5.1.3. Let P and Q be partitions of a closed bounded interval [a, }]. 
Then we say that Q is a refinement of P if PC Q. 


For example, the partition 0 < 1/4 < 1/3 < 1/2 < 2/3 < 3/4 < lisa 
refinement of the partition 0 < 1/4 <1/2<3/4<1. 


Theorem 5.1.4. Let f be a bounded function on a closed bounded interval (a, b}. 
If Q and P are partitions of [a,b] and Q is a refinement of P, then 


(5.1.5) L(f, P) < L(f,Q) < U(F,@) < UCSF, P). 


Proof. We will prove this in the case where Q is obtained from P by adding just one 
additional point to P. The general case then follows from this using an induction 
argument on the number of additional points needed to get from P to Q (Exercise 
5.1.7). 

Suppose P = {a = 2¢ <a <-+: < ap = b} and Q is obtained by adding one 
point y to P. Suppose this new point lies between xj; and aj. Then, in passing 
from P to Q, the subinterval [2j_1,<rj] is cut into the two subintervals [xj_1, y] and 
[y,a;], while all other subintervals [rq —1, 7g] (k # j) remain the same. Thus, in 
the formulas (5.1.2) and (5.1.1) for the upper and lower sums, the terms for which 
k # j are unchanged when we pass from P to Q. To prove the theorem, we just 
need to analyze what happens to the jth terms in (5.1.2) and (5.1.1) when P is 
replaced by Q. 


With m; and M; as in Definition 5.1.1 for the partition P, we set 
mi, = inf{ f(x): a € [ej_1,y]}, Mj = sup{f(x): x € [j-1,y]}, 
mt = inf{ f(x): € [y.a;]}, My = sup{f(x) : 2 € [y,ay]}- 
Then mj; = min{m',,m'!} and M; = max{M}, MY}, and so 
mj(xj — aj_1) = mj(y — 2j-1) + mj(aj — y) 


<mi(y— 25-1) + m"(xj — 9), 


104 5. The Integral 


while 
M3(y — 23-1) + Mi (a; — y) 
< Mj(y = 2j-1) + My(a; — y) = Mj(w; — 2-1). 
Now (5.1.5) follows from this and the fact that the other terms in the sums 
defining U(f, P) and L(f, P) are unchanged when P is replaced by Q. a 


Note that any two partitions P and Q of an interval [a,b] have a common 
refinement. In fact, the set-theoretic union PU@Q is a common refinement of P and 
Q. This, together with the preceding result, leads to the following theorem, which 
says that every lower sum is less than or equal to every upper sum. 

Theorem 5.1.5. [f P and Q are any two partitions of a closed bounded interval 
[a,b] and f is a bounded function on [a,b], then 


L(f, P) < U(f,Q). 
Proof. We simply apply the previous theorem to P and its refinement PU Q and 
to Q and its refinement PU Q. This yields 
L(f, P) < L(f, PUQ) < U(f, PUQ) < U(S,Q). oO 


The Integral. Given a closed bounded interval [a,b] and a bounded function f 
on [a, }], we set. 


—b 
/ fda = inf{U(f,Q) : Qa partition of [a, b}}, 


b 
/ fide = sup{L(f,Q) : Qa partition of [a, b]}. 


We will call these the upper integral and lower integral, respectively, of f on [a,b]. 
Theorem 5.1.5 says that every lower sum for f is less than or equal to every upper 
sum for f. Thus, each upper sum U(f, P) is an upper bound for the set of all lower 
sums. Hence, it is at least as large as the least upper bound of this set; that is, 


f fide <U(f,P) for all partitions P of [a,}. 
Te 


This, in turn, means that [” fda is a lower bound for the set of all upper sums 
a 
and, hence, is less than or equal to the greatest lower bound of this set. That is, 


fee 


fdr < [tax 


Definition 5.1.6. Suppose f is a bounded function on a closed bounded interval 
{a, 8]. If the upper and lower integrals of f on [a,b] are equal, we will say that f 


pb 
is integrable on [a,b]. In this case the common value of f° f dx and J’, f dx will be 
La 


[seve 


and will be called the Riemann integral of f on (a,b). 


denoted by 


5.1. Definition of the Integral 105 


Theorem 5.1.7. The Riemann integral of f on [a,b] exists if and only if, for each 
> 0, there is a partition P of a,b] such that 


(5.1.6) U(f,P)-L(f,P) <«. 


Proof. Suppose the integral exists. Then 
b —b 
sup (P)= f fae— f far—guts,P), 
Pp. a a 


where P ranges over all partitions of [a,b]. Thus, given ¢ > 0, the number [° f de — 


¢/2 is not an upper bound for the set of all L(f,P) and the number J, f de + ¢/2 
is not a lower bound for the set of all U(f, P). This means there are partitions Q1 
and Q» of a,b] such that 

= 


b 
ff f4e-f2< U,Q) < UQ2)< f fae rer 


La 
If P is a common refinement of Q; and Qo, then Theorem 5.1.4 implies that 
— 


b 
' f de —€/2< LUL,Qu) < LU, P) < UF. P) S Uf Qa) < i fide + e/2 


a. 


Since f° fde = Jf de, this implies that (5.1.6) holds. 
Je 


Conversely, suppose that for each ¢ > 0 there is a partition P such that (5.1.6) 


holds. Then 
—» 


b 
LU,P)S / fde< [ sarc vuP) 


Lia 


implies that 
—b 
| fac- f tars U(f, P) - L(f,P) <e. 
This means that 0 < J fdx — f° fde < ¢ for every positive ¢, which is possible 
ae Lieeeb 
only if f,fde—f’ fde =0. Thus, ffde = f° f de. oO 
Ja La 
The above theorem leads to a sequential characterization of the Riemann inte- 
gral which will be highly useful in proving theorems about the integral. 


Theorem 5.1.8. The Riemann integral exists if and only if there is a sequence 
{Pn} of partitions of [a,b] such that 


(5.1.7) lim(U(f, Pn) — L(f, Pn)) = 0- 
In this case, 
[ f(x) dx = lim Sp(f) 


where, for each n, Sn(f) may be chosen to be U(f, Pa), L(f; Pa), or any Riemann 
sum (5.1.1) for f and the partition Pn. 


106 5. The Integral 


Proof. If, for every ¢ > @, we can find a partition P of [a,b] such that (5.1.6) holds, 
then, in particular, for each n € N we can find a partition P, such that 


U(f, Pn) — L(f, Pn) < 1/n. 
Then lim(U(f, Pn) — L(f, Pn)) =@ 
Conversely, if there is a sequence of partitions {P,} with 
lim(U(f, Pn) — L(f, Pn) =@, 
then, given € > @, there is an N such that 
U(f, Pa) — Lf, Pa) < © whenever n>N. 


By the previous theorem, this implies that the Riemann integral exists. 


Now given a sequence {Pp} satisfying (5.1.7), we know that 


b 
LU, Pa) S | fla)dx < U(f, Pa) 
F 
for each n. It follows that the sequences {L(f, P,)} and {U(f, Pa)} both converge 
to [tera However, by (5.1.4), we also have 
, 


Lf, Pn) S Sn(f) < UC, Pa) 


if S(f) is any Riemann sum for f and the partition P, er is U(f, Pa) er L(f, Pa): 
By the squeeze principle (Theorem 2.3.3), we conclude 


i f(x) dx = lim S,(f). a 


Example 5.1.9. Prove that the Riemann integral of f(z) = x? on (0, 1] exists and 
is equal to 1/3. 

Solution: The function is increasing and so its sup on any interval is achieved 
at the right endpoint and its inf is achieved at the left endpoint. For each n € N 
we define a partition P, of [0, 1] by 


Pr ={0< I/n<2/n<---<n/n=1}. 


This divides [0,1] into n subintervals, each of which has length 1/n. The corre- 
sponding upper sum is then 


eee es eo 
uP = > (5) s=a Db" 
k= k=l 


1 


while the lower sum is 


The difference is 


5.1. Definition of the Integral 107 


This sequence certainly has limit 0 and so, by Theorem 5.1.8, the Riemann integral 
exists. To find what it is, we need a formula for the sum )7;_; k?. Such a formula 
exists. In fact, it can be proved by induction (Exercise 5.1.3) that 

" 


2s 


k=1 


n+ 1)(2n+ 1) 
yr 


Thus, 
n(n +1)(2n+1) _ (14 1/n)(2+1/n) 
ne 3 6 ; 


U(f, Pa) = 


1 
This expression has limit 1/3 as n + 00 and so [ ade = 1/3. 
e 


—————— SS 
Exercise Set 5.1 


1. Find the upper sum U(f, P) and lower sum L(f, P) if f(x) = 1/ on [1,2] and 
P is the partition of [1, 2] into four subintervals of equal length. 


1 
2. Prove that | «dx exists by computing U(f, Pa) and L(f, Pa) for the function 


0 
f(x) = and a partition P, of [0,1] into n equal subintervals. Then show that 
condition (5.1.7) of Theorem 5.1.8 is satisfied. Calculate the integral by taking 
the limit of the upper sums. Hint: Use Exercise 1.2.9. 


3. Prove by induction that 


yo _ n(n+ 1)(2n +1) 
= 6 
k=1 


‘a ae 
|. Prove that f/ 2dx = 5. by expressing this integral as a limit of Riemann 


e 
sums and finding the limit. 


a 


. Let f be the function on [0,1] which is @ at every rational number and 1 at 
every irrational number. Is this function integrable on (0, 1]? Prove that your 
answer is correct by using the definition of the integral. 


6. Prove that the upper sum U(f, P) for a partition of [a, b] and a bounded function 
f on [a,b] is the least upper bound of the set of all Riemann sums for f and P. 


_ 


. Finish the proof of Theorem 5.1.4 by showing that if the theorem is true when 
only one element is added to P to obtain Q, then it is also true no matter how 
many elements need to be added to P to obtain Q. 

8. Suppose m and M are lower and upper bounds for f on [a,b]; in other words, 

m< f(«) <M for all « € [a,b]. Prove that 


bo $F 
mo—a)< f sejdes f f(a)de< M(r~a), 


‘ 
What conclusion about [ f(x) dx do you draw from this if the integral exists? 
Ja 


108 5. The Integral 


9. If k is a constant and fa, }] is a bounded interval, prove that k is integrable on 
[a,b] and 


b 
[ kda = k(b—a). 
dq 
10. If f is a bounded function on [a,b] and P = {te <1 <+++ < xp} is a partition 
of [a,b], show that 


U(f,P) — L(f,P) = So(Me — mx)(an — te); 


k=l 


where My is the sup and mg the inf of f on [,—1, 4]. What does this simplify 
to if P is a partition of [a,b] into n equal subintervals? 


11. Suppose f is any non-decreasing function on a bounded interval [a,b]. If Py is 
the partition of [a,b] into n equal subintervals, show that 


bra 


U(f, Pn) — Lf, Pn) = (f(b) — F(a)) 


n 


What do you conclude about the integrability of f? 


5.2. Existence and Properties of the Integral 


We first show that the integral exists for a large class of functions, a class which 
includes all the functions of interest to us in this course. We then show that the 
integral has the properties claimed for it in calculus courses. 


Existence Theorems. 


Theorem 5.2.1. If f is @ monotone function on a closed bounded interval (a, b], 
then f is integrable on [a,b]. 


Proof. This was essentially stated as an exercise (Exercise 5.1.11) in the previous 
section. In that exercise, it is claimed that, if f is a non-decreasing function on 
[a,b] and P, is the partition of [a,b] into n equal subintervals, then 

b-a 


n 


(5.2.1) U(f, Pn) — L(f, Pu) = (f(b) — f(a) 


This implies that 
lim(U(f, Pn) — L(f, Pn) = 9 

and, by Theorem 5.1.8, this implies that the Riemann integral of f on [a,b] exists. 

In the case where f is non-increasing, the same proof works. The only difference 


is that f(b) — f(a) is replaced by f(a) — f(b) in (5.2.1). a 


Theorem 5.2.2. If f is a continuous function on a closed, bounded interval (a, b], 
then f is integrable on [a,b]. 


5.2. Existence and Properties of the Integral 109 


Proof. Since f is continuous on the closed, bounded interval [a,b], it is uniformly 
continuous on [a,b] by Theorem 3.3.4. Thus, given ¢ > 0, there is a 6 > 0 such that 

€ . 

If(2)— Fy) < gz whenever fr —yl <6. 

=a 
Then, if P = {a = ae < 2) <-:- < 2, = b} is any partition of [a,b] with the 
property that the interval [x_1,x] has length less than 6 for each k, then the 
maximum value My of f on this interval and the minimum value mp of f on this 
interval differ by less than ¢/(b — a). By Exercise 5.1.10, this implies that 

2 ee 
U(f,P) = L(f, P) = So(Me — mx)(ax = te-1) < SO (en tea) =«6 


k=l Oe1 


since }* (x, — 4-1) = b—a. It follows from Theorem 5.1.7 that f is integrable on 
k=l 
[a, b]. Qo 
Linearity of the Integral. In the remainder of this section we adopt the follow- 
ing notation, introduced in Section 1.5 for the sup and inf of a function f on an 
interval /: 


sup f= sup{f(x):c€ 1} and inf f = inf{f(c): 0 € I}. 
i 


The integral is a linear transformation from the space of integrable functions 
on [a,}] to the real numbers. This just means that the following familiar theorem 
is true, 

Theorem 5.2.3. If f and g are integrable functions on a closed, bounded interval 
[a,b] and c is a constant, then 


b b 
(a) ef is integrable and if cf (a) dx =e } f(a) da; 


b b ib 
(b) f 49 és integrable ond [ (42) + 9(2)) de = f fayae f g(a) de. 


Proof. We begin by investigating the upper and lower sums for a partition P = 
{a= te <a, <+++ < aq = b} and the functions cf and f+g. We let I, = [cx—1, ex] 
denote the kth subinterval determined by this partition. 
If c > 0, then Theorem 1.5.10(a) tells us that 
supef=csupf and infef =cinf f 
ig Te Tk Tk 
for k =1,...,n. This implies that 
(5.2.2) U(cf,P) = cU(f,P) and L(cf,P)=cL(f,P) if ¢>0. 
On the other hand, by Theorem 1.5.10(b), 
sup(-f)=—inf f and inf(—f) = —sup f 
Tk Tk Tk Te 
for each k. This implies that 


(5.2.3) U(-f,P)=-L(f,P) and L(-f,P) = -U(f,P). 


110 5. The Integral 


For the sum f +g, we have 


inf f + infg S < inf (f +g) <sup(f +9) < sup f + sup g 


Ik Ih I 
for each k, by Theorem 1.5.10(c). These inequalities imply that 
(5.24)  L(f,P)+L(9,P) < L(f +9, P) < U(f +9, P) < U(F, P) + U(g, P). 
With these results in hand, the proof of the theorem becomes a relatively simple 


matter. We use Theorem 5.1.8. Since f is integrable, there is a sequence {Py} of 
partitions of [a,b] such that 


(5.2.5) lim(U(f, Pn) — L(f, Pa)) = 0 
If c > 0, then (5.2.2) implies that 

lim(U (ef, Pa) ~ L(cf, Pa)) = lim c(U(f, Pn) ~ L(f, Pn) = 0 
which implies that cf is integrable. It also follows from (5.2.2) that 


b 
[ cf (x) dx = lim U(ef, Pn) = clim U(f, Pr) =e f 50 poling 


Similarly, using (5.2.3) yields 
lim(U(—f, Pa) — L(=f, Pn)) = lim(-L(f, Pa) + U(f, Pn)) = 9, 


which implies that —f is integrable. It also follows from (5.2.3) that 


ie =f (x) da = limU(—f, Pn) = —lim L(f, Pa) = - f 10 ) da. 


Combining these results proves part (a) of the theorem. 


Since g is also integrable, there is a sequence of partitions {Qn} such that (5.2.5) 
holds with f replaced by g and Py replaced by Qn. In fact, we may replace { Pa} 
and {Qn} by the sequence of common refinements {P, UQn} and get a sequence of 
partitions that works for both f and g. Since this is so, we may as well assume that 
{Pa} was chosen in the first place to be a sequence of partitions such that (5.2.5) 
holds and 


(5.2.6) lim(U(g, Pr) — L(g, Pa)) = 0 
also holds. Then (5.2.4) implies that 
OS U(f +9, Pn) — Lf + 9, Pa) S U(f, Pn) — Lf, Pr) + U(g, Pr) — L(g, Pn): 


Since the expression on the right has limit 0, so does U(f + 9, Pa) — L(f +9, Pa)- 
Hence, f + g is integrable. Also, on passing to the limit as P ranges through the 
sequence of partitions P,, inequality (5.2.4) implies that 


i: “Geledeers i "fte) aay [ arn 


This completes the proof of part (b) of the theorem. a 


5.2. Existence and Properties of the Integral 1 


The Order Preserving Property. The integral is erder preserving: 


Theorem 5.2.4. If f and g are integrable functions on [a,b] and f(x) < g(x) for 


all x € [a,b], then 
b 
7 f(a)da < ‘ g(x) de. 


Proof. We first prove that if h is an integrable function which is non-negative on 


[a,b], then 
ib 
[ h(a) dx > 0. 
& 


In fact, this is obvious. If h is non-negative, then its inf on any subinterval in 
any partition is also non-negative. This implies that the lower sum L(h, P) is non- 
negative for any partition P. Since the integral is greater than er equal to every 
lower sum, it is non-negative. 

To finish the proof, we apply the result of the previous paragraph to the function 
h = q—f which is non-negative on [a,b] if f(x) < g(x) for x € [a,b]. Using linearity 
(Theorem 5.2.3), we conclude that 


b b b 
[aerar- f saar= f (ote) - H2)) ar 20. 
This proves the theorem. Oo 


This has the following useful corollary. Its proof is left to the exercises. 


Corollary 5.2.5. Let f be an integrable function on the closed bounded interval 
I= [a,b] and set M = sup; f and m= inf; f. Then 


m(b—a) < [1 dx < M(b—a). 


Theorem 5.2.6. If f is integrable on [a,b], then |f| is also integrable on [a,b] and 


[seve < [irene 


Proof. Let f be integrable on [a,b]. Suppose we can show that |/| is also inte- 
grable on [a,b]. To derive the above inequality is then quite easy. The inequalties 
=f (x)| < f(x) < |f(w)], together with Theorem 5.2.4, imply that 


ai [f(e)) dr < [sera < [scolar 


and this implies that 
b b 
[fou < [ If(e)| de. 


To complete the proof, we must show that the integrability of f on [a,b] implies 
the integrability of ||. 


Let I be an arbitrary subinterval of [a,b]. Then, by the triangle inequality, 


If(@)| = FQ) S IF) = FM) 


112 5. The Integral 


for all x,y € I. It follows from this (Exercise 5.2.8) that 
sup |f|— inf |f| < sup f —inf f. 
T qT T T 
If we apply this as J ranges over each subinterval in a partition P, the result for 
the upper and lower sums is 
U(|f|,P) — Lilfl, P) < U(E, P) — Lf, P)- 
It now follows from Theorem 5.1.7 that |/| is integrable on [a,b] if f is integrable 


on [a, b]. a 


Theorem of the Mean for Integrals. If f is an integrable function on a bounded 
interval {a, b], then the mean er average of f on [a,}] is the number 


; x - i f(a) dx. 


The following theorem is an easy consequence of the Intermediate Value Theorem 
and Corollary 5.2.5. 


We leave its proof to the exercises. 


Theorem 5.2.7. If f is a continuous function on a closed bounded interval {a, b], 
then there is a point c € [a,b] such that 


b 
Ho = pa [Heya 


Interval Additivity. Note that, in the following theorem, we do not assume 
that f is integrable. 


Theorem 5.2.8. Suppose a <b <c and f is a bounded function defined on [a, ¢]- 
Then the upper and lower integrals of f satisfy 


[seas = [sears [seas J sevae = [seu + [sees 


Proof. We will prove the result for the lower integral. The proof for the upper 
integral is analogous. 

Let P = {a = ae < 2) < ++: < xq = c} be a partition of [a,c] which has the 
point b as its mth partition point. Then P determines partitions 


Pl={a=20 <4) <+++<am=b} of [a,b] and 
Pi ath= ta <n Say Soh at, (bye). 


In this case, 

(5.2.7) L(P’, f) + L(P", f) = L(P, f). 

Each pair consisting of a partition P’ of [a, b] and a partition P” of [c,d] fit together 
to form a partition P of [a,c] of this type. 


Now let Q be any partition of [a,c]. Then the union of Q with the singleton 
set {b} forms a refinement P of Q which is of the above type. Then 


LUF.Q) $ LAP) S [sous 


5.2. Existence and Properties of the Integral 13 


But [‘ f(x) de is the sup of all numbers of the form L(Q, f) for Q a partition of 
A 
[a,c], and L(f,P) = L(f, P’) + L(L, P”). It follows from Theorem 1.5.7(c) that 


b ce 
/ f(a) de + | f(x) dx = sup L(P', f) + sup L(P", f) 
ta xb ‘ae re 


= sop (L(P 1) + LEP") = f Se)dn 


where P! and P” range over arbitrary partitions of [a,b] and [b,c]. This proves 
the theorem for lower integrals. The proof for upper integrals is essentially the 
same. a 


This theorem has as a corollary the interval additivity property for the integral. 
The details of how this corollary follows from the above theorem are left to the 
exercises, 


Corollary 5.2.9. With f anda <b<c as in the previous theorem, f is integrable 
on [a,c] if and only if it is integrable on [a,b] and on [b,c]. In this case, 


if f(e)ae= f° s(e)de t [ fea. 


A Stronger Existence Theorem. Another consequence of the interval addi- 
tivity theorem (Theorem 5.2.8) is the following stronger version of the existence 
theorem for integrals of continuous functions (Theorem 5.2.2). The proof is left to 
the exercises. 


Theorem 5.2.10. If f is a bounded function on a closed bounded interval [a,b] 
and f is continuous except at finitely many points of [a,b], then f is integrable on 


{a, 6] 


a ee 
Exercise Set 5.2 


1. Show that if a function f on a bounded interval can be written in the form g—h 
for functions g and h which are non-decreasing on {a,b}, then f is integrable on 
[a, 0]. 

2. If f is a bounded function defined on a closed bounded interval [a, b] and if f is 
integrable on each interval [a,r] with a <r <b, then prove that f is integrable 


on [a, b] and 
b r 
[%@ ae = tin, f f(x) de. 


Observe that the analogous result holds if [a,r] is replaced by [r,b] in the 
hypothesis and in the integral on the right and the limit is taken as r — a. 
Hint: Use Theorem 5.2.8 and Exercise 5.1.8. 

3. Prove Theorem 5.2.10. That is, prove that if f is a bounded function on a 
bounded interval [a,b] and f is continuous except at finitely many points in 
[a,b], then f is integrable on [a,b]. Hint: Use the preceding exercise, interval 
additivity, and an induction argument on the number of discontinuities. 


114 


5. The Integral 


10. 


di; 


12. 
13. 


14. 


. Prove Corollary 


. Prove Corollary 


Al 
. Prove that 1 < | 
Jil 


5. 
5.2.9. 


dx < 2 fer allneN. 


wen 


1 el’ 
. Prove that 7: 7 cde < 2/3 for allneN. 


,ita™ 


. If f is a bounded function defined on an interval I, then prove that 


sup |f| — inf |f| < sup f — inf f 
I T T T 


by using the triangle inequality, | f()| — |f(y)| < |f(x) — f(y)|, and Theorem 
1.5.10(d). 


. Prove that if f is integrable on [a,b], then so is 2. Hint: If |f(«)| <M for all 


x € [a,b], then show that 


P(e) -— PY) S 2MIF(w) — FI 
for all x,y € [a,b]. Use this to estimate U(f?, P)—L(f?, P), for a given partition 
P, in terms of U(f, P) — L(f, P). 


Prove that if f and g are integrable on [a,b], then so is fg. Hint: Write fg as 
the difference of two squares of functions you know are integrable and then use 
the previous exercise. 


Give an example of a function f such that || is integrable on [0,1] but f is not 
integrable on [0, 1]. 


Prove Theorem 5.2.7. 


Let {fn} be a sequence of integrable functions defined on a closed bounded 
interval [a,b]. If {fn} converges uniformly on [a,b] to a function f, prove that 


f is integrable and 
b b 
ii f(e) dx = lim if fale) de. 


Is the function which is sin 1/a for x / @ and @ for x = @ integrable on (0, 1]? 
Justify your answer. 


5.3. The Fundamental Theorems of Calculus 


There are two fundamental theorems of calculus. Both relate differentiation to 
integration. In most calculus courses, the Second Fundamental Theorem is usually 
proved first and then used to prove the First Fundamental Theorem. Unfortunately, 
this results in a First Fundamental Theorem that is weaker than it could be. To 
prove the best possible theorems, one should give independent proofs of the two 
theorems. This is what we shall do. 


5.3. The Fundamental Theorems of Calculus 115 


First Fundamental Theorem. The following theorem concerns the integral of 
f' on [a,b] where f is a function which we assume is differentiable on (a,b) but 
not necessarily at a or b. The reason the integral still makes sense is that, for a 
function that is integrable on [a,b], changing its value at one point (or at finitely 
many points) does not affect its integrability er its integral (Exercise 5.3.9). Thus, 
a function which is missing values at a and/er b can be assigned values there 
arbitrarily and the integrability and value of the integral will not depend on how 
this is done. 


Theorem 5.3.1. Let [a,b] be a closed bounded interval and let f be a function 
which is continuous on [a,b] and differentiable on (a,b) with f! integrable on [a, b). 
Then 


ib 
if I") dx = f(b) — fla). 


Proof. Let P = {a = 29 < x1 <-+- < tn = b} bea partition of [a,b]. We apply 
the Mean Value Theorem to f on each of the intervals [x,_1,2,]. This tells us there 
is a point cy € (ap—1, 1%) such that 


J" (cr) (tx — te-1) = f (te) — f(@e-1). 


If we sum this over k = 1,...,n, the result is 
Sy S' (ck) (tx — te-1) = f(b) — f(a). 
k=1 


The sum on the left is a Riemann sum for f! and the partition P and so, by (5.1.4), 
it lies between the lower and upper sums for f’ and P. Thus, 


(5.3.1) LP) < Fb) ~ fa) < Uf". P). 


Since f’ is integrable on [a,], there is a sequence of partitions {P,} for which 


the corresponding sequences of upper and lower sums for f’ both converge to 
b 


f'(x) dz. However, in view of (5.3.1) the only number both sequences can 
converge to is f(b) — f(a). a 


The above theorem is somewhat stronger than the one usually stated in calcu- 
lus, because we only assume that the derivative f” is integrable on [a,b], not that 
it is continuous. Are there functions which are differentiable with an integrable 
derivative which is not continuous? 


Example 5.3.2. Find a function f which is differentiable on an interval, with an 
integrable derivative which is not continuous. 

Solution: Let f(z) = 2?sin1/x if « / 0 and set f(0) = 0. Then, f is 
differentiable on all of R and its derivative is 


f'(x) =2esin1/x—cosl/x if «40 


and it is 0 at x = 0. This follows from the Chain Rule and the Product Rule for 
derivatives everywhere except at z= 0. At x = 0 we calculate the derivative using 
the definition of derivative: 

2 at. 1 
f'(0) = lim 7 5 /« 


ze x 


= lim asin 1/z =0. 
re 


116 5. The Integral 


Now the function f/() is integrable on any closed bounded interval (see Exer- 
cise 5.2.2), but it is not continuous at 0. Thus, f is a function to which the previous 
theorem applies, but it does not have a continuous derivative. 


pb 
Second Fundamental Theorem. So far we have defined the integral f(x) dx 


only in the case where a < b. We remedy this by defining the integral to be 0 if 


a= band declaring 
b 

i f(x) dex = — 
de 


This ensures that the integral in the following theorem makes sense whether « is 
to the left or the right of a. 


Theorem 5.3.3. Let f be a function which is integrable on a closed bounded in- 
terval [b,c]. For a,x € [b,¢] define a function F(x) by 


F(a) = [ f(t) dt. 


Then F is continuous on [b,c]. At each point x of (b,c) where f is continuous the 
function F is differentiable and 


F'(x) = f(x). 


Proof. The definition of F makes sense, because it follows from Theorem 5.2.8 
that a function integrable on an interval is also integrable on every subinterval. 


Since f is integrable on [b,c], it is bounded on [b, ¢]. Thus, there is an M such 
that 


If()| SM forall t€ [b,c]. 
If 2,y € [b,c], then 


y c ry 
(5.3.2) Py) = Flo) = [ soa f roa | f(t)dt 
(see Exercise 5.3.11). Then by Exercise 5.3.12, 


IF) — F(@)| = 


y 
[ seal <Mly- 
J. 
Thus, given € > 0, if we choose 6 = e/M, then 
|F(y) — F(z)|<e€ whenever |y—2| <6. 


This shows that F is uniformly continuous on [b,c]. 
Now suppose « € (b,c) is a point at which f is continuous. If y is also in (b,c), 


then 


oy 
fi Sle) dt = f(a)y =<) 


5.3. The Fundamental Theerems ef Calculus 117 


since f(z) is a constant as far as the variable of integration ¢ is concerned. This 


and (5.3.2) imply that 
FW) =F) (f F() dt— [ f(a) at) 
= [ro neyar 


ya 
Since f is continuous at «, given € > 0, we may choose 6 > 0 such that 
[f(t) — f(x)| <€ whenever |x —t| <6. 


Then, for y with |y — | < 6, it will be true that | — | < 6 for every £ between x 
and y. Thus, for such a choice of y, we have 


= [uw seoyal| < 


y-@ Sy= 


(5.3.3) 


In view of (5.3.3), this implies that 


Thus, F is differentiable at « and F(x) = f(«). a 


ine 


d 2 
Example 5.3.4. Find =f et dt. 
a fy 


iG 
Solution: This is a composite function. If F(u) = | e-® dé, then the func- 


0 
tion we are asked to differentiate is F(sinz). By the Chain Rule, the derivative of 
this composite function is 
F'(sinz) cos z. 
By the previous theorem, F’(u) = e~“. Thus, 


dad pane 


de Jy 


2 : a 
e' dt= F"(sinz) cost =e *™ * cosa. 


d ve 
Example 5.3.5. Find — [ sin ¢? dt. 
dx J, 


Solution: We begin by writing 


Qr Qr 
aa) [ sin@ar= [ sint? dt — [nea 
eet 10 JO 


Then using the previous theorem and the Chain Rule yields 


G'(a) = 2sin 42? — sinz?. 


Substitution. We will not rehash all the integration techniques that are taught 
in the typical calculus class. However, two of these techniques are of such great 
theoretical importance that it is worth discussing them again. The techniques in 
question are substitution and integration by parts. Each of these follows from the 
Fundamental Theorems and an important theorem from differential calculus — the 
chain rule in the case of substitution and the product rule in the case of integration 
by parts. We begin with substitution. 


118 5. The Integral 


Theorem 5.3.6. Let g be a differentiable function on an open interval I with g! 
integrable on I and let J = g(1). Let f be continuous on J. Then for any pair 
abel, 

9(b) 


b 
(5.3.4) [ f(g(t))g"(t) dt = f(u)du. 


s(e) 


Proof. The composite function f og is continuous on J since g is continuous on I 
and f is continuous on J. By Exercise 5.2.10, this implies that f(g(t))g’(4) is an 
integrable function of t on I. We set 


é 
F(v) = (u) du. 
(0) 
Then F'(v) = f(v) by the Second Fundamental Theorem, and so, by the Chain 
Rule, 
(F(g()) = f(g(2))9'(@)- 

So, Fog is a differentiable function on J with an integrable derivative f(g(«))g'(«). 
By the First Fundamental Theorem, 


pb 
F(a(s))~ Fala) = [ f(g())g'(@) de. 


70) 
By the definition of F, F(g(a)) = 0 and F(g(b)) = f(u)du. Thus, 
a(e) 


as claimed. Oo 


Note that the above theorem states formally what happens when we make the 
substitution u = g(d) in the integral on the left in (5.3.4). 


Integration by Parts. The integration by parts formula is a direct consequence 
of the Fundamental Theorems and the product rule for differentiation. 


Theorem 5.3.7. Suppose f and g are continuous functions on a closed bounded 
interval [a,b] and suppose that f and g are differentiable on (a,b) with derivatives 
that are integrable on {a,b]. Then fg’ and f'g are integrable on [a,b] and 


pb 
(5.3.5) [ feos@ae- Jab) ~ F(a)ata)~ f g(a) f (a) de. 


Proof. We have f and g integrable because they are continuous on [a,], while 
f' and g! are integrable by hypothesis. By Exercise 5.2.10, fg! and gf’ are both 
integrable. 


The product fg is differentiable on (a, 5) and 
(f9)' = fa! + of". 


5.3. The Fundamental Theerems ef Calculus 119 


Thus, (fg)' is also integrable and, by the First Fundamental Theorem, 


pb b 
1(0)a(0) ~ Fleyata) = f Ue)ata)y'ae = f f(a)g' (x) dex + [ sereae 
Formula (5.3.5) follows immediately from this. Qo 


Example 5.3.8. Suppose f is a continuous function on [-7,7] which is differen- 
tiable on (—7, =) with an integrable derivative. Also suppose f(—1) = f(r). Prove 
that, for each n € N, 


x 
‘g f'(x)sinnzde =—n | f(x) cosnzx dz, 
(5.3.6) = 


ff oecsnede =n f” f(x) sinna de. 


Solution: These are the equations relating the Fourier coefficients of the de- 
rivative of a function f to the Fourier coefficients of f itself. 

The first equation is proved using the integration by parts formula (5.3.5) for 
f(a) and g(a) = sina. Since sin(—ns) = sin(ns) = 0, the terms f(b)g(b) — f(a)g(a) 
are 0. The first equation then follows directly from (5.3.5). 

The second equation follows from (5.3.5) for f(z) and g(z) = cosa. However, 
this time the terms f(b)g(b) — f(a)g(a) contribute 0 because cos is an even function 


and f(—x) = f(n). 


_——————————— 
Exercise Set 5.3 


2) 
1. Find ij (2x sin 1/a — cos 1/a) dx. Hint: See Example 5.3.2 
4/n 


a pe 
2. Find & [ cos 1/tdt for x > 0. 
dx J, 


YP fect 
3. Find — [ sint? dt. 
dz Jy 
ap Re 
1. Find =f e® dt. 
dx Jif. 
5. If f(e) = —1/s, then f"(x) = 1/2. Thus, Theorem 5.3.1 seems to imply that 
1 
[ Va? dx = f(A) ~ f(-1) = -1-1=-2. 
ez 
However, 1/2? is a positive function, and so its integral over [—1, 1] should be 
positive. What is wrong? 


6. If f is a differentiable function on [a,b] and f’ is integrable on [a, ], then find 


ri ” fle) s"(e) dz. 


120 5. The Integral 


7. Let f be a continuous function on the interval [0, 1]. Express 


[2 
[ f (sin 8) cos 0 d0 


as an integral involving only the function f. 


8. Find [ t" Intdé where n is an arbitrary integer. 
oO 


9. Prove that if f is integrable on [a,b] and c € [a,b], then changing the value of 
f at c does not change the fact that f is integrable or the value of its integral 
on [a, b). 

10. The function f(x) = 2/|«| has derivative 0 everywhere but at « = 0. Its 
derivative f/(z) = 0 is integrable on [—1, I] and has integral 0. However f(1) — 
f(=1) = 1- (-1) = 2. This seems to contradict Theorem 5.3.1. Explain why 
it does not. 

11. The interval additivity property (Theorem 5.2.8) is stated for three points a, b,c 
satisfying a <b <c. Show that it actually holds regardless of how the points 
a, b, and c are ordered. Hint: You will need to consider various cases. 


12. Suppose f is integrable on an interval containing a and b and |f(«)| <M on 


I. Prove that 
b 
| f(a) dx 
, 


Note that we do not assume that a < b. 


< M|b—al. 


5.4. Logs, Exponentials, Improper Integrals 


The following development of the log and exponential functions is the one presented 
in most calculus classes these days. It is such a beautiful application of the Second 
Fundamental Theorem that we felt obligated to include it here. 


The Natural Logarithm. One consequence of the Second Fundamental The- 
orem is that every function f which is continuous on an open interval J has an 
antiderivative on J. In fact, if a is any point of J, then 


a-[ f(t)dt 


is an antiderivative for f on I (that is, F(e) = f(x) on 1). 
ntl 
x : 5 é See Re 
Now 7 is an antiderivative for x” for all integers n with the exception of 
n 
2 = —1. However, since 2! is continuous on (0,00) and on (—co,0), it has 


an antiderivative on each of these intervals. There is no mystery about what the 
antiderivatives are. On (0, +00) the function 


x 
[ia 
t 


is an antiderivative for 1/.r. Obviously, this function is important enough to deserve 
a name. 


5.4. Legs, Expenentials, Impreper Integrals 121 


Definition 5.4.1. We define the natural logarithm to be the function In, defined 


for x € (@, +00) by 
ne = [$a 
y # 


This is the unique antiderivative for 1/2 on (@,+00) which has the value @ when 
cal. 


On (=o, @) an antiderivative for 1/z is given by 


1 
f dt. 
ia 


Note that the x that appears in this integral is negative, and so —x = |a|. If we 
make the substitution s = —t, then Theorem 5.3.6 implies that 


é 
[ Vir= [ tgs = In(=4) Sele: 
Jae 4 8 


Thus, In |x| is an antiderivative for 1/a on both (@, +00) and (—00, @). 


The next two theorems show that In has the key properties that we expect of 
a logarithm. 


Theorem 5.4.2. For all a,b € (@,+00), mab=Ina+Inb. 


1 1 
Proof. By the Chain Rule, the derivative of Inax is —a = —. Thus, Inax and 
ar 
Ina have the same derivative on the interval (@, +00). By Corollary 4.3.4 
Inax=Inz +e 


for some constant c. The constant may be evaluated by setting = 1, Since 
In1 = @, this tells us that ¢ = Ina. Thus, 


Inaz = Inz + Ina. 
This gives Inab = Ina + Inb when we set x = b. E] 


Theorem 5.4.3. If a >@ andr is any rational number, then Ina’ = rlna. 


Proof. The proof of this is similar to the proof of the previous theorem. The key 
is to compute the derivative of the function Ina". We leave the details to Exercise 


BAL. a 
Theorem 5.4.4. The natural logarithm is strictly increasing on (@, +00). Also, 
lim Ing = +00 and lim nz = —ce. 
+00 230 


Proof. The function Ina is strictly increasing on (@, +00) because its derivative is 
positive on this interval. 


Since nl = @ and In is increasing, In2 is positive. Given any number M, 
choose an integer m such that mIn2 > M and set N = 2". Then 


Ing >In2"=mm2>M_ whenever «>N. 
Thi 


easily from lim,_,.eInz = +00 and properties of In. The details are left to the 


implies that lim, Ina = +00. The fact that lim,_,9Inx = —oo follows 


exercises. Oo 


122 5. The Integral 


The Exponential Function. The function In is strictly increasing on (0, +00) 
and, therefore, it has an inverse function. The image of (0,-+oo) under In is an 
open interval by Exercise 4.2.5. By Theorem 5.4.4 this open interval must be the 
interval (—oo, 00). Therefore, the inverse function for n has domain (—00,00) and 
image (0,00). 


Definition 5.4.5. We define the exponential function to be the function with 
domain (—00, 00) which is the inverse function of In. We will denote it by exp x. 


The theorems we proved about In immediately translate into theorems about 
exp. 


Theorem 5.4.6. The function exp is its own derivative — that is, exp! (x) = exp(x). 


Proof. By Theorem 4.2.9 we have 
1 1 


“Gena “eee exp(). a 


exp! (2) 


Theorem 5.4.7. The exponential function satisfies 
(a) exp(a + b) = expaexpb for alla,b € R; 
(b) exp(ra) = (exp(a))’ for alla € RandréQ. 


Proof. Let x = expa and y = exp), so that a = Ine and b=Iny. Then 
exp(a +b) = exp(Inz + Iny) = exp(Incy) = ey = expaexpb 


by Theorem 5.4.2. This proves (a). The proof of (b) is similar and is left to the 
exercises, a 


We define the number e to be exp 1, so that Ine = 1. It follows from (b) of the 
above theorem that, if r is a rational number, then 


(5.4.1) e” = (exp 1)" = expr. 


Now at this point, ” is defined for every positive @ and rational r. We have not 
yet defined a* if « is a real number which is not rational. However, exp « is defined 
for every real x. Since (5.4.1) tells us that e” = expr ifr is rational, it makes sense 
to define e* for any real x to be exp. 
More generally, if a is any positive real number and r is rational, then 
a” = (expIna)” = exp(rIna), 

and so it makes sense to define a? for any real x to be exp(x Ina). The following 
definition formalizes this discussion. 


Definition 5.4.8. If is any real number and a is a positive real number, we define 
at by 
a® = exp(xlna). 


In particular, 


e =expz. 


54. Legs, Expenentials, Impreper Integrals 123 


With this definition of a*, the laws of exponents 
a™¥ =a™a" and a™ = (a) 


are satisfied. The proofs are left to the exercises. 


The General Logarithm. We define the logarithm to the base a, log,, to be the 
inverse function of the function a*. The following theorem gives a simple description 
of it in terms of the natural logarithm In x. The proof is left to the exercises. 

Ing 


Theorem 5.4.9. For each a > 0, we have log, x = inva 
na 


b 
Improper Integrals. So far, we have defined the integral [ f(x) dex only for 


Jo 
bounded intervals [a,b] and bounded functions f on [a,b]. Thus, our definition does 
not allow for integrals such as 


Peta tf 
dx or ds. 
(een ee: 


It turns out that a perfectly good meaning can be attached to each of these integrals. 


To do so requires extending our definition of the integral. 
cos 


We first define an integral of the form f(x) dx where a is finite. We assume 


that f is integrable on each interval of the form [a, s] for a < s < oo. Then we set 


[ sevae = tim [ 102) ar, 


ided this limit exists and is finite. In this case, we say that the improper integral 


f(a) dx converges. 


b 
Integrals of the form / f(a) dx are treated similarly. Assuming f is inte- 
es 


grable on each interval of the form [r,b] with oo <r < b, we set 


b b 
J H(a)de = tin f f(a) dx, 


provided this limit exists and is finite. In this case, we say that the improper integral 


f(a) dx converges. 
a 
a 


For an integral of the form | f(r)d, we simply break the integral up into 


oe 
a sum of improper integrals involving only one infinite limit of integration. That 


Ce eyae= ff f(a) da4 [Hoa 


If the two improper integrals on the right converge, we then say the improper 
integral on the left converges ~ it converges to the sum on the right. 


we write 


124 5. The Integral 


20 


z dx or show that it fails to converge. 


Example 5.4.10. Pind f —_ 
l+cz 


00 


Solution: We write 


eee hi, 1 1 
=the ~ da + 5 dz. 
gone oo +a ff ite 


Then, since arctan’ (zc) = 


—,, the First Fundamental Theorem implies that 


Item 


1 : 1 
de= lim | —ds 
og 142? roof, 14a? 


= lim (arctan0 — arctanr) = 7/2 
pees 


ie el) ce ll 
[ dx = lim [ dx 
Je 142? s—00 1+2? 


= lim (arctan s — arctan 0) = 1/2. 
ptten 


and 


Sea. Hl 
Thos, [ Tag Ue converges ton. 


—oo 


Functions with Singularities. If a function f is integrable on [r, b] for every r 
with @ <r < b but unbounded on the interval (a,b), then it is not integrable on 
[a,b]. It is said to have a singularity at a. Still, its improper integral over [a,b] may 


exist in the sense that 
lim if f(a) dx 
rat J, 


may exist and be finite. In this case we say that the improper integral i f(x) dx 
Jo 
converges. Its value, of course, is the indicated limit. 


Similarly, a function f may be integrable on [a, s] for every s with a <s <b 
but not bounded on [a,b). In this case, its improper integral over [a,b] is 


Jim [ f(a) dx 


provided this limit converges. 


It may be that the singular point for f is an interior point ¢ of the interval over 
which we wish to integrate f, That is, it may be that a <c <b and f is integrable 
on closed subintervals of [a,b] that don’t contain c, but f blows up at c. In this 


Retiewnie 
b c 
[eoae= [ reyacs [reas 


If the two improper integrals on the right converge, then we say the improper 
integral on the left converges and it converges to the sum on the right. 


1 
Example 5.4.11. Find [ a 'Bdz 
-1 


5.4. Legs, Expenentials, Impreper Integrals 125 


Solution: Here the integrand blows up at 0. An antiderivative for 21/9 is 


S9/ 3. Thus, 


5 Bios : 3 
a7 Baga li bod ti Got gene 7 | ee 
fa de = Tim 5(8%8 — (-1)7/8) = —5, 
while 
ery 3 E fa a3 
Wde= jj 1)2/3'—, (2/8) 5 2 
qf. a VS de= lim, 5 (1) (7)/9) = 5. 
Thus, 


1 ‘1 
if x asm fs act [ oP dr 
+ le 


converges to —} + 3 = 0. 


The following is a theorem which can be used to conclude that an improper 
integral converges without actually carrying out the integration. 


» 
Theorem 5.4.12. Let f(a) dx be an improper integral — improper due to the 
a 


fact that a = —00 or b= 60 or f has a singularity ata or f has a singularity at b. 
If g is a non-negative function such that |f(x)| < g(a) for all x € (a,b) and if 


[ peeve 


| ; f(a) de 


Proof. We will prove this in the case where the bad point is b — either b = 00 or 
f blows up at b. The case where a is the bad point is entirely analogous. 


Let h(x) = f(x) + [f(x)|- Then 0 < h(a) < 29(x) for all x € (a,b). So 
: 
H(s) = [ h(a) da and [ g(x) de 
i Ja 
are non-decreasing functions of s (Exercise 5.4.16) and 


H(s)< 2 [ aleyae € 2 [oleae 


The integral on the right is finite by hypothesis. It follows that the non-decreasing 
function H(s) is bounded above. By Exercise 4.1.13, lims,.- H(s) converges. 


converges, then 


also converges. 


Hence, the improper integral [| h(x) dix converges. 
B 


The same argument with h replaced by |f(«)| shows that p \f(a)| da con- 
Jo 


b 
verges. Since f = h —|/\, it follows that [ f(a) dx also converges. a 
a 


126 5. The Integral 


0 " 


Example 5.4.13. Determine whether [ e * dx converges. 


—co 


Solution: Since e~* < (by Exercise 4.4.3) and each of 


1 
1+a? 


na | 
est and 


converges by Example 5.4.10, the same is true of the corresponding integrals for 
= 


e-*’. It follows that e-* dx converges. 
_ 


Cauchy Principal Value. Note that we break an improper integral of the form 
co 
(5.4.2) [ f(x) dx 
—co 


up into the sum of f°. f(x) dx and {°° f(x) de and then require that each of these 
improper integrals converges before we are willing to say that (5.4.2) converges. 
This ensures that 


fo 
lim / f(a) de 
ab—rco fg 
exists and is the same number, independently of how a and b approach oo. This is 
a strong requirement. In many situations, the improper integral in this sense will 
fail to converge even though the limit may exist if (a,b) is constrained to lie along 
some line in the plane. Of special interest is the case when a and b are constrained 
to be equal. This leads to 


7 
lim | f(a) de. 
five 

If this limit exists, then we say that the Cauchy principal value of the improper 


integral (5.4.2) exists. Similarly, the Cauchy principal value of an integral over an 
interval [a,b] on which f has a singularity at an interior point c is 


cor - 
lim I f(a) da + ie f(x) w| 


if this limit exists. The existence of the Cauchy principal value is much weaker 
than ordinary convergence for an improper integral. 


Example 5.4.14. Show that the improper integral 


oo ee 
——_ dx 
ie 142? 


does not converge but it does have a Cauchy principal value. 


Solution: We have 


o pO b 
ote See ih de + hi 
Pera tin, fee in [ 


5.4. Legs, Expenentials, Impreper Integrals 127 


The first of the above limits is limy_ soo —1/2In(1 + a2) = —00 while the second is 
img, ,00 1/2In(14-82) = co. Neither of these converges and so the improper integral 
does not converge. However, the Cauchy principal value is 


i "aL 
tin ale = Jn, 1/2(na ~ Ina) = 0. 


ee 
Exercise Set 5.4 


1. Supply the details for the proof of Theorem 5.4.3. 


a 


2. Prove that In (¢) = Ina —Inb for all a,b € (0, +00). 


3. Finish the proof of Theorem 5.4.4 by showing that lim, ,9 Ina = —oo. Hint: 
This follows easily from lim,_,,¢ In = +00 and properties of In. 


1. Prove part (b) of Theorem 5.4.7. 


5. Using Definition 5.4.8 and the properties of exp prove the laws of exponents: 


6. Compute the derivative of a® for each a > 0. 
7. Find an antiderivative for a* for each a > 0. 
8 


. Prove Theorem 5.4.9. 


as 
1 
9. For which values of p > 0 does the improper integral iL = de converge? 
ana 
Justify your answer. 


1 
iL 

10. For which values of p > 0 does the improper integral f = der converge? 
Oe 


Justify your answer. 


sin 
11. Show that / Tat tir converges. Can you tell what it converges to? 
z 
aes 


1 
12. Does the improper integral 1 Inacdx converge? If so, what does it converge 
0 


to? 


13. Suppose that f and g are non-negative functions on R which are integrable 
on each finite interval [a,b] and that f(x) < 9(z) for all z € R. Show that if 
the improper integral f°. f(x) dar diverges, then so does the improper integral 


[2 9(@) de. 


14. Prove that if f is integrable on every interval [a,b] on R and if f is an odd 
function, then f has Cauchy principal value 0. 


128 5. The Integral 


1/3 


vite 


15. Prove that the improper integral dx does not converge but that 


it has Cauchy principal value 0. 
16. Prove that if f is an integrable function on every interval [a,s) with s < b and 
if f(x) > 0 on [a,b], then the function F(s) = n f(x) dx is a non-decreasing 

e 


function on [a,}). 


ey 
Chapter 6 


Infinite Series 


Infinite series play a fundamental role in mathematics. They are used to approxi- 
mate complicated or uncomputable quantities or functions by simpler quantities or 
functions. They are widely used by engineers and scientists in real-world applica- 
tions of mathematics. 


6.1. Convergence of Infinite Series 


An infinite series of numbers is a formal sum 


ie 
(6.1.1) Sag = a1 + ag + ages tant: 
k=l 


of an infinite sequence of numbers a, called the terms of the series. We say formal 
sum because the actual sum may or may not exist. What does it mean for the 
actual sum to exist? To answer this, we proceed in much the same way that we 
did in defining improper integrals. We cut off the sum after some finite number n 


of terms and then take the limit as n + oo. That is, we set 


(6.1.2) Bn = Doan =a) Fag +43 +++ + an. 
k=1 


The number s,, is called the nth partial sum of the series. 


Definition 6.1.1. The series (6.1.1) is said to converge to the number s if lim s,, = 
s. In this case we write 


33 
Soap ae: 
k=l 


The number s is called the sum of the series. If the sequence {sn} diverges, then 
we say the series (6.1.1) diverges. 


129 


130 6. Infinite Series 


It is important to keep firmly in mind the difference between a sequence and a 
series. A series is a formal sum of a sequence of numbers. Each series 


a, +a, +agt-+-+ap+--: 


has two sequences associated to it: the sequence of terms {a,} and the sequence of 
partial sums {s,,}, where s, = a] +a2 +::++a@n. 


A series (6.1.1) converges if and only if its sequence of partial sums converges. 
What about the sequence of terms {an}? What is the relationship between conver- 
gence of the series and convergence of its sequence of terms? The following theorem 
gives a partial answer. 


Theorem 6.1.2 (Term Test). [f a series aj + a2 +a3 +++: +a, +++: converges, 
then lima, = @. 


Proof. If the series converges to s, then lim s, = 8, where {sn} is the sequence of 
partial sums (6.1.2). However, an = $n — 8n—1 ifn > 1, and so 


limay = lims, —lims,_1 =s—s=@. Oo 


The above theorem is called the term test because it provides a test that the 
terms of a series must pass if the series converges. If the series fails this test — that 
is, if lim ay either fails to exist er is not @ if it does exist, then the series diverges. 
However, this test can never be used to prove that a series converges, since it does 
not say that if lima, = @, then the series converges. In fact, the series 

Lael 1 
stgtctpte 
has a sequence of terms {1/k} which converges to @, but the series itself does not 
converge. This series is called the harmonic series. To see that it diverges, group 
the terms in the following way: 


w+(5)+(5+i)+(Gtatptalte: 


Each group in parentheses is a sum of 2” terms each of which is at least as big as 
1/2"*1, ‘Thus, each group in parentheses sums to a number greater than or equal 
to 1/2. It follows that the 2"th partial sum of the harmonic series is at least n/2. 
Thus, the sequence of partial sums has limit +00, and so the series diverges. 


k 
2k+1 


1+ 


co 
Example 6.1.3. Does the series }~ converge? 
k=l 


k ; vad 
—~—$ and this sequence has limit 
OKI 

1/2 as k + 00. Since the sequence of terms does not converge to @, the series fails 
the term test, and so it diverges. 


Solution: No. Its sequence of terms is { 


oo 
k 
Example 6.1.4. Does the term test tell us whether = pray converge 


Solution: If we apply the term test, the result is 


= lm 


bye 1/k 
im a 
R41 1+ 1/k? 


6.1. Cenvergence ef Infinite Series 131 


The fact that this limit is 0 tells us nothing. The series may er may not converge 
(in fact, in Example 6.1.14 we will prove that it diverges). 


Remark 6.1.5. Although, in our discussion so far, we have assumed that the index 
of summation k for a series runs from 1 to 00, there is really no reason to start the 
summation at k = 1. It could just as easily start at k = 0, k = 2, or k = 100. Our 
discussion of convergence for series is not affected by where the summation begins, 
since the only effect on the partial sums s, of changing the starting point will be 
to add the same constant to each of them. 


Geometric Series. The simplest meaningful series is also one of the most useful. 

This is the geometric series 

oo 
(6.1.3) Slark =atartar? 4+---4ark +. 

a) 
Here a and r are any two real numbers. The number a is the initial term of the 
series, while the number r is called the ratio for the geometric series, since, for 
k > 1, it is the ratio of the kth term ar* to the previous term ar*~!. It is the fact 
that this ratio is independent of k that characterizes the geometric series. 

a 


Theorem 6.1.6. [fa / ®, the geometric series (6.1.3) converges to if\rl<1 


=r 
and diverges if |r| > 1. 


Proof. The series fails the term test if |r| > 1, since limar* / 0 in this case. Thus, 
the geometric series diverges if |r| > 1. 

Assume |r| < 1. If 8) = a +ar+ar?+---+ar” is the nth partial sum of the 
series, then 
start! 


$n = ar-+ar* tar? 
and so 
(l—r)sn = $n—rsn =a—ar"*}, 


Thus, since r / 1, we may divide by 1 —r to obtain 


a—ar"tl 
Nain erate: 
Ley 
This sequence converges to since limr"+! = 0. Qo 
—r 


Example 6.1.7. Does the series 1/2 + 1/4 + 1/8 +---+1/2" +--+ converge? If 
so what: does it converge to? 
Solution: This is a geometric series with ratio r = 1/2 and initial term a = 1/2. 
1/2 
=1/2 


Thus, it converges to = 1, by the previous theorem. 


Series with Non-Negative Terms. Let aj + a2+--+-+a,+--- bea series with 
ax > 0 for all k. Then, its sequence {8} of partial sums satisfies 
8n41 = Sn + Gn41 > Sn: 


That is, it is a non-decreasing sequence. If such a sequence is bounded above, then 
it converges by Theorem 2.4.1. If it is not bounded above, then it has limit -+oo. 
This proves the following theorem. 


132 6. Infinite Series 


Theorem 6.1.8. An infinite series of non-negative terms converges if and only if 
its sequence of partial sums is bounded above. 


Comparison Test. The comparison test stated in most calculus texts follows 
ise 6.1.11). With a little more work, 
the following, more general, version of the comparison test can also be proved this 
way. We give a different proof, based on Cauchy’s criterion for convergence. 


easily from the preceding theorem (see Exer 


Theorem 6.1.9 (Comparison Test). Suppose aj + a2 +++: + an +++» and by 4 
by +--+ + be +++» are series, with by > 0 for all k, and suppose there are positive 
constants Ky and M such that 

(6.1.4) Jax] < Mb forall n>K. 

Then if by + bg +++: + bp + +++ converges, so does aj + ag+-+++ap+-°>. 


Proof. Let sn = }~ ax and tn = )> by be the nth partial sums for the two series. 
=1 1 

If the series with terms b, converges, then the sequence {tn} converges and, hence, 
is Cauchy. This implies that, given ¢ > 0, there is an N such that 


ia 
52 by = |tm — tal < vA whenever m>n>N. 
k=n+1 


Then (6.1.4) implies that 


3 i a 
I8m—8al=] > axl < SO laxl<M SO ue <e 
k=n41 k=n+1 k=n41 


whenever m > n > max(N, A’). This implies that {sn} is a Cauchy sequence and, 
Ss 


hence, converges. It follows that the series }” a, converges. Oo 
fai 
e 
Suppose ) ax is an arbitrary series. If we set bk = Jag|, then the condition 
k=1 


Jax] < Mby of the previous theorem is satisfied with M = 1 and K = 1. This 
observation yields the following corollary. 


oe = 
Corollary 6.1.10. If > |ax| converges, then so does Sax. 
k=1 k=1 
This leads to the following definition. 
ss a 
Definition 6.1.11. A series ) ax is said to converge absolutely if the series > |ax| 
k=1 k=1 


converges. 


Thus, Corollary 6.1.10 asserts that if a series converges absolutely, then it 
converges. 


6.1. Cenvergence ef Infinite Series 133 


eS 
k 
Example 6.1.12. Does the series }~ ag converge? Why? 
k=l 
Solution: Since lim 5 pees = 0 (L’Hepital’s Rule), there is an N such that 
s vh k>N 
RD <1 whenever >N. 
Then 
2 ds Be a ahenvens ESN, 
oF — Dyes = Wa whenever N. 
— 1 Ak 
Since the series )  —— is a convergent geometric series, the series > <- con- 
2 We i te 


verges by the comparison test. 


oo 
k 
Example 6.1.13. Does the series SO sr converge? Why? 
k=l 
Sk 
Solution: By the previous exercise, the series Se sr 


k=1 


converges and this means 


i 


eo 

k 

that Ye He ag converges absolutely and, hence, converges by Corollary 6.1.10, 
k=1 


The comparison test can also be used to prove that a series diverges. 
Example 6.1.14. P: that th i x li 
1.14. Prove that the series y —=>—— diverges. 
P pet . 


Solution: We compare with the harmonic series. Since k? +1 < 2k? for k € N, 
we have 


1 
Pee for all kEN. 
ey Ver et ae 
2 4 oy 
If the series » FETT converses, then so does > Z by the comparison test. How- 


also diverges. 


k 
ever, the harmonic series diverges. Therefore > 7 — 
=1 


ee 
Exercise Set 6.1 


In each of the following six exercises, determine whether the indicated series con- 
verges. Justify your answer. 


4 


oo 


1 
See 


134 


6. Infinite Series 


re 


z 
eras 


In each of the next four exercises, determine whether the indicated series converges 
absolutely. Justify your answer. 


1. 


12. 


13. 


le 


Y-2/3)%. 


(1 
In(i+k)" 


eve the following weak version of the comparison test using Theorem 6.1.8: 
if ay +a +++: ages: and by + by +--+ by +-++ are series of non-negative 
terms with ax < by for all k, then if by + by +++: + bk +++» converges, so does 
ay tagte-- Fay te, 

Consider the decimal expansion .didydgdq +++ of a real number between @ and 1, 
where {dx} is a sequence of integers between @ and 9. This decimal expansion 
represents the sum of a certain infinite series. What series is it and why does 
it converge? 

Show that every real number in the interval [0,1] has a decimal expansion as 
described in the previous exercise. 

Define a sequence {ax} inductively by setting a; = 1/3 and, if the first & terms 
have been chosen, then choose ax 1 to be one third of what is left after the sum 
of the first k terms is subtracted from 1. Does the series 33%, ax converge? If 
so, what does it converge to? 


6.2. Tests for Convergence 


In this section we will develop the standard tests for convergence of infinite series. 
Most of these are based on Theorem 6.1.8 or Theorem 6.1.9. 


6.2. Tests fer Convergence 135 


Integral Test. 
Theorem 6.2.1. Suppose f is a positive, non-increasing function on [1,00) and 
oo 
ax = f(k) for each k € N. Then the series > ax converges if and only if the 
k=1 


co 
improper integral i: f(x) dx converges. 
dt 


Proof. Consider the function g(r) on [1,00) which, for each k € N, is constant on 
the interval [k,k +1) and equal to f(k) at k. That is, 


gla) =f(k)=ax if kSa<k+1, KEN, 
This is a piecewise continuous function, hence integrable on any finite interval [1, b). 
Also, since f is non-increasing, it follows that 


ae +1) < fe) < g(a) forall xe [1,oo) 
(see Figure 6.2.1). On integrating from 1 to n, this yields 


[we + l)dr< [toa < i g(x) de. 


However, by Exercise 6.2.9, 
2 a el 


(6.2.1) [ g(x + 1)dx = Say and ‘a g(x) da = Soar. 
1 k=2 R k=1 


If s, = > ax, this implies that 
k=1 


Sn — Qy </ f(x) dx < sn-1. 
1: 
It follows that the sequence of partial sums {8} is bounded above if and only if the 
b 


increasing function of b, [ f(a) dz, is bounded above. A non-decreasing sequence 


converges if and only if it is bounded above and a non-decreasing function on [1,00) 
has a finite limit at oo if and only if it is hounded above (Exercise 4.1.13). Thus, 


the series converges if and only if the improper integral converges. ia) 
A 1 
Example 6.2.2. A p-series is a series of the form )> por Where p > 0. Prove that 


k=1 
a p-series converges if and only if p > 1. 


Solution: We apply the integral test for the function f(z) = 1/a?. Note that 
this is a positive, decreasing function on [1,00) and f(k) = 1/k? fork EN. Ifp 4 1, 
we have Z ; 

1 Pp 
ey ear 
1 oP l-p 
1 F , 
As b+ 00, this has limit | if p > 1 and limit +00 if p < 1. Thus, the p-series 


Pp 
converges for p > 1 and diverges for p < 1 by the integral test. 


136 6. Infinite Series 


254 


Figure 6.2.1. Set-up fer Preef ef the Integral Test. 


For p = 1, the p-series is the harmonic series and we already know it diverges. 
However, it is instructive to see how this follows from the integral test. 


In the case p = 1, the function f is f(a) = 1/a. We have 


1 
[Lavan 
LE 


and this has limit +00 as b + 00. Thus, applying the integral test gives another 
eo 
diverges. 
= 


re 


proof that the harmonic series }> 
1 


SS 3VK 
Example 6.2.3. Does the series » age 7 Converse or diverge? Justify your 


answer. 
3Vk 
Qk? = 
1 
Kr" 


, 3 ; 
is close to —~.. This suggests we do a com- 


Solution: For large k, RB? 


sé 
parison with the p-series }~ 
k=l 


We have 2k? — 1 > k? for all k > 1 and so 


3Vk 3Vk 3 


ae 1 Se BP 


Since the p-series with p = 3/2 converges, so does our series, by the comparison 
test. 


Root Test. This test is particularly important in the study of power series, 


6.2. Tests fer Cenvergence 137 


co 


Theorem 6.2.4. Given an infinite series Yar. let 
k=1 


p = limsup lax|!/*. 


Then the series converges absolutely if p <1 and diverges if p > 1. 


Proof. Recall that 
lim sup |a,|!/* = limé, where tn = sup{lax|'/*: k > n}. 
Also recall that {tn} is a non-increasing sequence. Thus, if p > 1, then 
tn = sup{lay|!/*:k>n}>1 forall neN. 


This means that, for every n € N, there is a k > n such that |ag|!/* > 1. Then 
Jax] > 1 also. It follows that the sequence of terms {a,} does not have limit 0. 
Hence, the series fails the term test and must diverge in this case. 


If p < 1, we can choose r such that p <r < 1. Then there is an N such that 
tn <r whenever n> N 
and this implies that 
lax|!/* <r whenever k>N. 
This, in turn, implies that 


Jax]<r* whenever k>N. 
2 


Thus, the series }~ Jay] converges in this case, by comparison with the geometric 


series with ratio r < 1. Therefore, the original series converges absolutely. a 


Note that the root test tells us nothing about convergence if the number p turns 
out to be 1. 


Example 6.2.5. Does the series 37%, k(9/10)* converge? Why? 
Solution: We apply the root test. In this case, the limsup of Theorem 6.2.4 
is actually a limit, since the limit exists. In fact, 


p = limk'/*(9/10) = (9/10) lim k!/* = 9/10 < 1, 


since lim k!/* = 1 by Exercise 2.3.12. By the root test, the series converges. 


Ratio Test. 
co 
Theorem 6.2.6. Given a series Ya, let 
kal 
(6.2.2) p= lim (eH 
lax| 


provided this limit exists. Then the series converges absolutely if r < 1 and diverges 
ifr>.. 


138 6. Infinite Series 


Proof. Observe first that, for the limit defining r to exist, the numbers a, must 
eventually be all non-zero ~ otherwise, the ratio ay;|/lax| would be undefined or 
Foo for infinitely many k. 


Ifr > 1, then there is an N such that 


lax+a] 


lax| 


Jax] >0 and >1 forall k>N. 


Then, for k > N 


lax| Jax—al lanl lana] 
lax| 


lan| > lanl. 


lax-1| lana lansil len] 

This implies that the sequence of terms {a;} fails to have limit 0, and the sequence 
diverges by the term test. 

If r < 1, we choose a ¢ such that r < t < 1. Since (6.2.2) holds, there is an N 


such that 


[eietil 5. So henaves va >N. 

lan 
Then, for k > N, 

aK a, 1 a. 2| |QNn. = 
al lax] lana] lew +el | Hay] <tt Nay. 
lax-iflax—al fawsil lenI 

Thus, |ax| < rol whenever k > N. By comparison with the geometric series 
with tatio £the séries:convergea. oO 


The ratio test tends to work well on series where the terms a, involve products 
of an increasing number of factors — things like factorials. These are generally more 
difficult to attack with the root test than with the ratio test. 


k! 


Example 6.2.7. Does the series ye i converge? Why? 
i=] 
Solution: We apply the ratio test: 
im AtD! 2 Hy + DE 
r=lim + =lim 
(KIEFER (e+ DEE 


Ste elie ated 
EM RSEL (E+ I/F e 


(see Example 4.4.6). Hence, the series converges by the ratio test. 


For many series, the ratio test and the root test work equally well. However, 
the ratio test is not applicable in many situations where the root test works well. 


Example 6.2.8. Prove that the series 1/3+1/2?+1/39+1/24+1/3°+--. converges. 


Solution: This one can easily be done using the comparison test. However, it 
is instructive to see how attempts to use the ratio test and root test work out. The 
ratio test doesn’t work, because the successive ratios are 


3/4, 4/27, 27/16, 16/243, 243/64,..., 


and this sequence of numbers has no limit. 


6.2. Tests fer Cenvergence 139 


On the other hand, the root test yields that p is the lim sup of the sequence 
1/3, 1/2, 1/3, 1/2, 1/3,.... 


That is, p = 1/2. Therefore, the series converges by the root test. 


a ee 
Exercise Set 6.2 


In each of the following eight exercises, determine whether the indicated series 
converges. Justify your answer by indicating what test to use and then carrying 
out the details of the application of that test. 


a 

iMs 
¢ 
Tle 


1 
So FT 
6 OX. 
4 
k=1 
(2 
8. 


k 
9. Verify the integral formulas (6.2.1) used in the proof of the integral test. 


S = 
10. Prove that if Va and b, are convergent series and c is a constant, then 
k=l k=l 
ES ie 
YE cay and S$ (ax + by )are also convergent. Furthermore, 
k=1 k=1 
© co 
Soeax = cS ax, and 
k=l k=1 


oo © S 
Va + bk) = Ya +o be. 
k=l k=1 k=1 


He 6. Infinite Series 


a 
11. Prove that if Sa, converges absolutely and {b,} is a bounded sequence, then 
k=1 


a 
Ye axbg also converges absolutely. 
k=1 


6 be 
12. Prove that if }* ax and S> by are series and a, = by except for finitely many 
k k 
values of k, then the two series either both converge er they both diverge. 


6.3. Absolute and Con 


nal Convergence 


By Corollary 6.1.10, if a series converges absolutely, then it converges. The converse 
is not true. As we shall see, it is possible for a series to converge even though the 
corresponding series of absolute values does not converge. 


Definition 6.3.1. A series which converges but does not converge absolutely is 
said to converge conditionally. 


Thus, a conditionally convergent series is one which converges, but its corre- 
sponding series of absolute values does not converge. For examples of conditionally 
convergent series, we turn to alternating series. 


Alternating Series. An alternating series is one in which the terms alternate 
in sign — each positive term is followed by a negative term and vice versa. Under 
reasonable additional conditions, such a series will converge. 


Theorem 6.3.2 (Alternating Series Test). Let {a;} be a non-increasing se- 
quence of non-negative numbers which converges to @. Then the series 


<0 
S(-y**+ tax =a, —a2+a3-a4+-°: 
k=l 


converges. In fact, if s, is the nth partial sum of this series and s = lims,, then 


|s—Sn|<ansy forall n. 


Proof. Since {a;} is a non-increasing sequence of non-negative numbers, we have 
ay — p41 > @ for all k. For n odd, this means 
Snt1 S $n t+ @nt2 = $n42 = $n — (An41 — Gnt2) S $n- 

That is, 

Sn41 SSni2 <8, forodd n. 
Similarly, 

$n < $n42<Sn41 foreven n. 
Thus, #3 < 3 < 9; and, after that, each term of the sequence {eq} lies hetween 


the previous two terms. It follows that 


82 £84 5 86 S++ S 82n S Seni S++ $5 S83 S51. 


6.3. Absolute and Conditional Convergence 141 


Hence, the subsequence of {s,,} consisting of terms with odd index n forms a non- 
increasing sequence which is bounded below, while the subsequence of terms with 
even index n forms a non-decreasing sequence which is bounded above. These two 
monotone, bounded sequences converge, and they must converge to the same limit 
5 because 


|Sn41 — Sn| = Qn41 
and the sequence {an} converges to 0. Since s is between s, and 841 for each n, 
this also shows that 
|s— 8n| S @n4iy 


as claimed. Oo 


An alternating p-series is a series of the form 


where p > 0. 


Example 6.3.3. Show that each alternating p-series with 0 < p < 1 converges 
conditionally. 


The alternating p-series satisfies the conditions of the alternating 
series test, since {1/k?} is a decreasing sequence which converges to 0. Thus, 
the alternating p-series converges for all p > 0. However, the ordinary p-series 


1 
> go liverges ifp <1 (Example 6.2.2). Thus, the alternating p-series converges 
k=l 
conditionally for 0 <p <1. 
In particular, the alternating harmonic series 
itl eae peal 
l-stqg-ctet (-I a+: 
a artac Gav eg 


converges conditionally. 


Absolute verses Conditional Convergence. Absolute convergence is a much 
stronger condition than conditional convergence. The importance of the concept 
of absolute convergence stems from the fact that, if the terms of an absolutely 
convergent series are rearranged to form a new series, then the new series converges 
to the same number as the original series (Theorem 6.3.5 below). This is not true 
of conditionally convergent series — in fact, it fails spectacularly. A conditionally 
convergent series can be rearranged so as to diverge to 00 or —00 or to converge to 
any given number (Theorem 6.3.4 below). 

oo oo 
By a rearrangement of a series Ya, we mean a series of the form S> a4(j); 
k=l jal 
where k(j) is a one-to-one function from N onto N. In other words, the rearranged 
series has exactly the same terms as the eriginal series, but arranged in a different 
order. 


Theorem 6.3.4. A conditionally convergent series has, for each extended real num- 
ber L, a rearrangement that converges to L. 


142 6. Infinite Series 


ce 


Proof. If }° ax is a conditionally convergent series, then by Exercise 6.3.7, the 


series of positive terms of this series diverges, as does the series of negative terms. 
Since the series of positive terms diverges, its sequence of partial sums is unbounded 
and, hence, has limit 00. Similarly, for the series of negative terms, the partial sums 
have limit —oo. 


We will prove the theorem in the case where L is a real number. The cases 
where L is oo er —ov are left to the exercises. 


Given a number L, we will define a sequence {bj} inductively in the following 
way: we let by be the first positive term in {ex} if 0 < L and the first non-positive 
term in {ax} if L <0. Suppose by, bo,...,bn have been chosen. We set 


n 
fn = y b; 
j=l 


and choose bn 41 according to the following rule: if s, < L, we choose b,41 to be 
the first positive term in {ax} that has not already been used. If L < sn, we choose 
bn41 to be the first non-positive term in {ex} that has not already been used. This 


defines the sequence {bj} inductively. The series > b; defined in this way has the 
j=l 
following properties: 
(1) Each successive partial sum sp is either as close or closer to L than its 
predecessor sy} or one of them is less than L and the other is greater than or equal 
to L. In the latter case, the distance from s,, to L is less than |sn — 5n—1| = |bn|. 


We call n a crossing integer in this case. 
a 


(2) There are infinitely many crossing integers. Our description of bj in- 
j=l 

volves adding successive positive terms until we reach or exceed L and then adding 
successive non-positive terms until we fall below L. Since the series of positive terms 
and the series of negative terms both diverge, no matter where a given partial sum 
lies we will always be able to add enough of the remaining positive terms to reach 
or exceed L or add enough of the remaining non-positive terms to fall below L. 
Thus, crossing L will occur infinitely often. 


(3) All the terms of {e,} are used in constructing the sequence {bj}, since at 
each step we are selecting the first positive term not already chosen or the first 
non-positive term not already chosen and both cases occur infinitely often. Thus, 
each a, will be chosen eventually. Also, at each stage we only choose from the 
terms not already chosen, and so each ax will be used just once. This means that 
the sequence {b;} is a rearrangement of the sequence {ex}. 

es 
(4) Since > aj converges, we have lima, = 0, and this implies limb; = 0 also. 

k=1 
This is proved as follows: if ¢ > 0, there is an N such that |ax| < ¢ whenever k > N. 
However, if we choose M to be an integer such that, by stage M in our construction 
all the terms @1,a2,...,@ have been chosen, then j > M implies that b; is not 


6.3. Absolute and Conditional Convergence 143 


one of these terms and, hence, is a term a, with k > N. Thi 
[bil <e 


in turn, implies that 


Now (1) and (2) and (4) imply that lims, = L. That is, the crossing integers 
define a subsequence of {s,} (by (2)) that is converging to L ( by (1) and (4)) and, 
between two successive crossing integers, the sequence {s,,} stays at least as close 
to L as it was at the first crossing integer of the pair (by (1)). 

© co 

Thus, 7b, is a rearrangement of }* ax which converges to L. o 

k=1 k=1 


The above theorem illustrates that a conditionally convergent series is a rather 
unstable object, since its sum is dependent on the erder in which the terms are 
added. On the other hand, an absolutely convergent series is quite stable in the 
sense that the sum is always the same regardless of the order in which the terms 
are summed. That is the content of the next theorem. 


Theorem 6.3.5. Each rearrangement of an absolutely convergent series converges 
to the same number as the original series. 


cy 


Proof. Let }~ ax be an absolutely convergent series which converges to the number 


m 

s. Since this series is absolutely convergent, the series }~|ax| also converges to a 
k=1 

number t. The difference between ¢ and the nth partial sum of this series is 


oo 


SS lal. 


kant] 


Because the partial sums converge to t, given € > 0, there is an N such that 


cS 


(6.3.1) SS lanl <e/2 forall n>N. 


kent] 


We fix one such n, and we also choose it to be large enough so that 


45 
(6.3.2) s— Sax] < €/2. 
k=l 


3 = 
Now suppose }~bj is a rearrangement of > ax. Then b; = ax) for some 


j=1 k=1 
one-to-one function k(j) of N onto N. Let J be the largest value of j for which 
k(j) <n. Then the terms aj,a2,...,@n of the original series all appear as terms in 
im 


the partial sum 76; as long as m > J. For such an m, the expression 


14 6. Infinite Series 


is a finite sum of terms a, with k > n. By (6.3.1) and the triangle inequality, such 
a sum must have absolute value less than ¢/2. This, combined with (6.3.2), implies 
that 
s— 52 d;|<e whenever m> J. 
j=l 

& 
Hence, the series > b; also converges to s. ao 

j=l 


Products of Series. In calculus we are taught how to multiply two power series. 
The formula for doing this relies on the following result, which requires that the 
two series be absolutely convergent (see Exercise 6.3.12). 


is Ss 
Theorem 6.3.6. Let Sax and S24; be two absolutely convergent series. Then 


SS 
= ey y anbn—k, 


n=0 k=0 


(6.3.3) 


where the series on the right also converges absolutely. 


Proof. Consider the set $ = {axbj : j,k € N}. The numbers in this set can be 
displayed in an infinite array, or matrix, as follows: 


aobo aibo agbo +++ anbo 

agby 1b) agby +++ andy 

agbz aibz  agbz +++ Anbg 
(6.3.4) : 

abn Aibn Gaby +++ Anbn 


The sum of the absolute values of the members of any finite subset of this set 
is less than 


w= (Som! ssl Ses ail 


k=0 j=0 j=0 k=0 

Since M is finite, this means that, given any series formed by summing the elements 
of in some order, the corresponding series of absolute values will have partial sums 
bounded above by M. Such a series must converge. Thus, each series formed by 
summing the elements of S in some order will be absolutely convergent, and all 
such series will converge to the same number by the previous theorem. 


One series formed by summing the elements of S is 


agbo + aob; + aiby + abo + aob2 + a1b2 + agb + agb; + agbo +- 


That is, in the array (6.3.4), for succesive values of n, we sum from left to right 
along the nth row to the main diagonal and then along the nth column from the 


6.3. Absolute and Conditional Convergence 145 


main diagonal back to the top row. The n? partial sum of this sequence is 


This sequence of numbers converges to the left side of equation (6.3.3). 


Another way of summing the numbers in the set $ yields the series 


SOV anban 


n=0k=0 
This is obtained by summing the array (6.3.4) along diagonals of the form k-+j =n 
for successive values of n. The resulting sum is the right side of equation (6.3.3). 
Since these two series must sum to the same number by the previous theorem, 
equation (6.3.3) is true. a 


Eee SS 
Exercise Set 6.3 


In each of the next five exercises, determine whether the given series converges 
absolutely, converges conditionally, or diverges. Justify your answer. 


cy co 


6. Give an example of two convergent series } “ax and > by such that the series 


k=l k=l 
S 
So andy diverges. 
k=1 
co 
7. If So ax is a series, we set af = ax ifax >0, ef =0 fax <0 anda, = ax if 
k=l 


«x <0, a, =O if e, > 0. Prove that if the series is conditionally convergent, 


© 00 
then both 7, and }* ej diverge. 
k=1 k=l 


146 6. Infinite Series 


8. Approximate the sum of the alternating harmonic series 


ere ie 1 
l-st+5-[t--+(-1)"1- 
hac uae Sou ere 
to within .01 by computing an appropriate partial sum. You will need a caleu- 
lator or computer. 


9. For the alternating harmonic series of the preceding exercise, follow the pro- 
cedure used in the proof of Theorem 6.3.4 to rearrange the series so that it 
converges to V2. Carry out this procedure until the partial sum of your new 
series is within .02 of V2. You will need a calculator or a computer. 


10. Show how to modify the proof of Theorem 6.3.4 to cover the cases L = 00 and 
L=—-oo. 


11. The geometric series > 2~* converges to 2. Use the product formula of The- 


k=0 
ee 
orem 6.3.6 to show that the series } (J + 1)2~* converges to 4. 
k=0 


12. Show that the product formula in Theorem 6.3.6 may fail to be true if the 


Consider the case where 


series involved are not absolutely convergent. Hint: 


oe k 
-1 
both series are (=) 2 


6.4. Power Series 


One of the most useful and widely used techniques of modern mathematics is that 
of expressing a complicated function as the sum of a series of simple functions. 
Examples include power series, Fourier series, and various eigenfunction expansions 
for differential equations. All involve series whose terms are functions rather than 
numbers. 


Series of Functions. Consider a series of the form 


(6.4.1) DY fe) = fale) + fale) + fae) +--+ fila 


is a function defined on 


where J is an interval in R and each of the functions fe( 
I. For each fixed value of « € I, this is just an ordinary series of numbers and it 
may or may not converge. The series may converge for some values of x and not 
for others. On the subset of J for which the series does converge, it defines a new 
function 


a2) = >> file). 
k=1 


6.4. Power Series 147 


This function is the limit of the sequence of functions 


=> hele) 
i=l 


obtained by taking the partial sums of the series. 


In( 


There are many questions one can ask about this situation: if the functions 
fu (a) are continuous or differentiable, is the same thing true of the function g that 
the series converges to? Can we integrate the function g over a subinterval of I by 
integrating the series term by term? When can we differentiate g by differentiating 
the series term by term? We can give satisfactory answers to a couple of these 
questions right a 


Definition 6.4.1. We say a series of functions (6.4.1) converges uniformly to g on 
T if its sequence of partial sums {gn} converges uniformly to 9. 


Theorem 6.4.2. If each f, is a continuous function on I and the series (6.4.1) 
converges uniformly to g on I, then g is aleo continuous on I. 


Proof. If the series (6.4.1) converges uniformly to g on I, then its sequence of 
partial sums {g,,} converges uniformly to g on I. Each gp is a finite sum of functions 
fx which are continuous on I and, hence, is also continuous on I. Since the limit 
of a uniformly convergent sequence of continuous functions is continuous (Theorem 
3.4.4), we conclude that g is continuous on I. a 


The proof of the next theorem is very similar ~ the theorem follows directly from 
the analogous result about integrating the uniform limit of a sequence of functions 
(Exercise 5.2.13). We leave the details to the exercises. 


Theorem 6.4.3. If each fy is continuous on [a,b] and the series (6.4.1) converges 
uniformly to g on [a,b], then 


| eds s | ” fale) de 


This means, in particular, that the series on the right converges. 


Weierstrass \/-test. The following is a test for uniform convergence of a series. 


It follows from an analogous test for uniform convergence of sequences (Theorem 
3.4.6). 


Theorem 6.4.4 (Weierstrass \/-test). A series of functions (6.4.1) on an in- 
terval I converges uniformly on I if there is a convergent series of positive numbers 


a 
Ma 
k=1 

such that |fx(«)|< Mx for all x € I and allk EN. 


Proof. By the comparison test, at each « the series (6.4.1) converges to a number 


g(x). If : 
SY fel 
k=l 


148 6. Infinite Series 


then 
I9(e)—gnle)|=| 3° fle) < SO fel 
Jk=nt+1 k=nt1 
< 0 M=S-S, 
kent] 


where S and S, are the sum and nth partial sum of the series S02, My. Since 
this series converges, lim(S — S,) = 0. The theorem now follows from Theorem 


3.4.6. Qo 
Example 6.4.5. Analyze the Fourier series 
oe 
cos ka 
k2 
k=1 
using the preceding three theorems. 
cos il 
Solution: We have B for all « € R. The series > pe converges, 


=1 
since it is a p-series with p > 1. Thus, it follows from the Weierstrass M-test that, 


oe 
3 cos kar 7 é P 
the series > I converges uniformly on R. The function g that it converges 
k=l 


to is continuous on R by Theorem 6.4.2. On every bounded interval [a,b], we have 


b co ‘J bo oe 1 
[ omar=De f cos ke de = 2 
J oP], 


k=1 


(sin kb — sin ke), 


also by Theorem 6.4.2. 


Power Series. A power series centered at «@ is a series of the form 


(6.4.2) 


This is a series with terms cj (a —a)* which are very simple — they are simple mono- 
mials in ( — e) and, hence, each is defined on all of R, is continuous, and, in fact, 
has derivatives of all orders. The partial sums of a power series are polynomials. 
The numbers cy are called the coefficients. 


‘A power series may converge for some values of « and not for others. The next 
theorem tells us a great deal about this question. 


Theorem 6.4.6. Given a power series (6.4.2), let 
x 1 
~ limsup [eg |!7*" 


where we interpret R to be oo (resp. 0) éf limsup |c,.[!/* 


is 0 (resp.co). 

If R> 0, then the series (6.4.2) converges for each x with |x — al < R and 
diverges for each x with |x —a| > R. Furthermore, the series converges uniformly 
on every interval of the form [a—r,a+r] withO <r <R. If R=0, then the series 


converges only when x =a. 


6.4. Power Series 149 


Proof. We first suppose R > 0. Given any r > 0, we have 


- 


(6.4.3) lim sup |exr*|!/* = r lim sup |e |!/* = 


7 
Now suppose |e —a| =r > R. Then |e, (a —a)*| = |cx|r* and the series (6.4.2) 
diverges, by (6.4.3) and the root test. 
On the other hand, if r < R and |x — al < r, then |ex(a — a)*| < |cx 


oo 


this case )> |ex|r* converges, by the root test and (6.4.3). Then the Weierstrass 


In 


k= 

M-test implies that the series (6.4.2) converges uniformly on the closed interval 
fa-r,a+r] = {x: |x-—al <r}. 

The uniform convergence of (6.4.2) on [a—r, a+] for eve 

the series converges on (a—R,a +R), since for eve 

iicdRssisch that leveled in. the interval [oS a4 4. 


yr < R implies that 
in this interval, there is an 


If R = 0 ~ that is, if lim sup |e, |!/* = 00 ~ then the only value of x that will 
lead to limsup |c,(« — a)*/!/* < 1 is « =a. Thus, the power series converges only 
at « =a in this case. Oo 


The above theorem tells us that the convergence set for a power series (6.4.2) 
is an interval of radius R = (limsup (cx|!/*)~!, centered at a. This interval is called 
the interval of convergence for the power series. The number R is called the radius 
of convergence of the series. Since the theorem says nothing when |x — a| = R, 
it does not tell us whether this interval is open, closed, or half-open, half-closed. 


Each of these possibilities occurs. 
Example 6.4.7. Give examples where the three possibilities mentioned in the 
previous paragraph occur. 

Solution: The examples are 


oo ok 


i BS 

Ot oc ee 
k=0 k=0 

In each case, the radius of convergence R is 1, since 


1 =limk!/* = (limk!/*)? = lim (k?)1/*, 


(a) 


| 


k=0 


When x = +1, series (a) diverges by the term test, since its terms are all +1; thus, 
its interval of convergence is (—1, 1). 

Series (b) is the harmonic series when x = 1 and the alternating harmonic 
series when x = —1; thus, its interval of convergence is [—1, 1). 

Series (c) is the p-series with p = 2 at « = 1 and the alternating p-series with 


p = 2when x =~—1. Both series are convergent and so the interval of convergence 
for (c) is [-1, 1]. 


Remark 6.4.8. Although the expression for the radius of convergence R, given in 
the previous theorem, is useful because it makes sense regardless of the series, it is 
often the case that the ratio test provides a more practical method for calculating 
the radius of convergence of a power series. 


150 6. Infinite Series 


co 


Example 6.4.9. Find the radius of convergence of the power series )~ rz 
1 ‘ 


Solution: We apply the ratio test. We have 


for all x. Thus, the series converges for all « and its radius of convergence is +00. 


Integration of Power Series. Since a power series centered at a, with radius of 
convergence R, converges uniformly on each interval of the form [a —r,a +r] with 
0<r< R, ow earlier theorems concerning continuity (Theorem 6.4.2) and term 
by term integration (Theorem 6.4.3) apply. They lead to the following theorem. 


& 
Theorem 6.4.10. If f(x) = )> cx(w—a)* on (a—R,a+R), where R is the radius 
k=0 
of convergence of this series, then f is continuous on (a— R,a+R) and 
oF oo G 
6.4.4 t)dt= 2 _(¢ —a)*t, 
(6.4.4) [ roa-y Sye-4 


if x €(a—R,a+R). The latter series also has radius of convergence R. 


Proof. The continuity of f is a direct consequence of Theorem 6.4.2, while the 
integral formula follows directly from Theorem 6.4.3 and the fact that 


is NR a —a)e} 
fe ak dt = 


The statement about radius of convergence is proved as follows: if we factor («—a) 
out of the series in (6.4.4), the remaining factor is 


eS Ck 
deri 


which clearly has the same convergence set and radius of convergence. By Theorem 
6.4.6, its radius of convergence is the inverse of 


—a)*, 


= lim sup |eg|!/*, 


1/k 1 

lim sup ( 1"! = limsup |c,|!/* lim ———__ 
» (i) P les! (e+! 

which is the radius of convergence of the original series. Here, the first equality 

follows from Exercise 2.6.8, while the second equality follows from the fact that 

lim(1 + k)!/* = 1 (a simple consequence of L’Hépital’s Rule). Thus, the series in 

(6.4.4) has the same radius of convergence as the original series. Qo 


Example 6.4.11. Find a power series in x which converges to In(1 +) in an open 
interval centered at 0. What is the largest such open interval? 


6.4. Power Series 151 


co 


Solution: If |x| < 1, the geometric series }* «* converges to 


k=0 
replace « by —t in this series, the result is 
‘i oe 
= ara: 
Tt 2 ty* for |t}<1. 


If we integrate with respect to ¢ from 0 to «, then it follows from the previous 
theorem that 


c 1 > gaktt Ss i ee 
In(1 +2) -{ a= 0(-18 2 = (ye 
fg TEE = Ral & k 


for || <1. The radius of convergence of this series is (lim sup(1/k)!/*)~! = 1 and 
so (—1,1) is the largest open interval on which this series converges to In(1 + 2). 


Differentiation of Power Series. We may also differentiate power series term 
by term. 


oo 


Theorem 6.4.12. If f(x) = )> cx(w—a)* on (a= R,a+R), where R is the radius 


of convergence of this series, then f is differentiable on (a— R,a+ R) and, on this 
interval, 


“ 

(6.4.5) £'(2) = So kex(e = a), 
k=1 

This series also has radius of convergence R. 


Proof. We set 
ce 
g(x) = So kex(w = a)*' 
kel 


This series has the same radius of convergence as the series 


S key(x —a)* = (a —a) o key (x — a)", 
k=l k=l 


and that is 
(lim sup [keg |!/*)~! = (lim &1/* lim sup |ex|!/*) 7 = R, 


since lim k!/* = 1, 


To complete the proof, we just need to show that g is the derivative of f. 
However, by the previous theorem, 


oe 


f oma = Yoca(-a)* = f(w) - 0. 


k=l 


By the Second Fundamental Theorem, f’(x) = g(x). a 


152 6. Infinite Series 


Example 6.4.13. Find a power series in x which converges to on an open 


dae 
interval centered at 0. What is the largest open interval on which this power series 
expansion is valid? 

Solution: As in the last example, we begin with the power series expansion of 


on (-1,1), 


k=0 
If we differentiate, using the previous theorem, the result is 


ca 


k=l 


k=0 
on (—1,1). By the theorem, this series has radius of convergence 1. Thus, (-1, 1) 
is the largest open interval on which this expansion is valid. 


Exercise Set 6.4 


cd 


“Le is continuous on the interval (—1, 1]. 


1. Prove that the function f(« 


an P : F - 
2. Prove that the function f(a “rs is continuous on the entire real line. 


3. Let {fx} be a sequence of differentiable functions on (a,b) and suppose there 
Ss 


is a point c € (@,b) such that the series }~ /;,(c) converges. Suppose also that 


k=1 
the sequence of derivatives {fi} satisfies |f{()| < My on (a,b) and the series 
os 


2 My converges. Then prove that the series defining 
k=1 
a 


YD file) and g(x) = 7 fhe) 


k=l k=1 


converge on (a,b) and f is differentiable with derivative g on (@,b). 


In each of the next, five exercises, find the radius of convergence of the 
indicated power series. 


6.5. Taylor’s Formula 153 


7. Soka —5)*, 
k=0 


ed 


8. Shaka, 


k=0 


9. Beginning with the geometric series which converges to on (—1,1), find 


t= 


power series which converge to and to arctan on this same interval. 


T+ 
10. Let {ax} be a non-increasing sequence of non-negative numbers which con- 
oe 
4 ver series Ves ak 
verges to 0. Use Theorem 6.3.2 to show that. the power series S(-1) Apa 
k=0 


converges uniformly on [0, 1] and, hence, converges to a continuous function on 
this interval. 

11. Use the preceding exercise and Example 6.4.11 to show that the alternating 
harmonic series 1 — 1/2 + 1/3 —-+»—1*+1/k +--+ converges to In2. Why do 
we need to use the previous exercise? Why isn’t Example 6.4.11 enough? 


12. Prove that if f() is the sum of a power series centered at a and with radius 
of convergence R, then f is infinitely differentiable on (a — R,a+ R) ~ that is, 
its derivative of order m exists on this interval for all m € N. 


13. Suppose functions g and h are defined by 


! 
ks k=0 (2k) 
Find the interval of convergence for each of these functions. 


14. Prove that the functions in the previous exercise satisfy g’ =h and h! = g. 


15. Prove Theorem 6.4.3. 


6.5. Taylor’s Formula 


Definition 6.5.1. Suppose f is a function defined in an open interval containing 
a. If there is a power series, centered at a, which converges to f in some open 
interval centered at a, then we will say that f is analytic at a. If f is analytic at 
every point of an open interval J, then we will say that f is analytic on I. 


When can we expect that f is analytic at a? According to Exercise 6.4.12, if 
f is the sum of a power series in some interval centered at a, then f is infinitely 
differentiable in this interval (meaning its nth derivative exists for every n € N). 
Thus, in order for a function to be analytic at a, it must be infinitely differentiable 
in some interval centered at a. However, this is not enough. In fact Exercise 6.5.13 
shows that there is a function which is infinitely differentiable in an open interval 
centered at 0 but is not the sum of a power series centered at 0. 


154 6. Infinite Series 


Power Series Coeffi 
power series expansion centered at a, then it is easy to see what the coefficients of 
the power series expansion must be. 


ents. If a function is analytic at a — that is, it has a 


oe 


Theorem 6.5.2. Suppose f( 


= cx(a — a)*, where this series converges to 
k=0 


£0 (a) 
i 


nt 


f(x) on an open interval containing a. Then en = for each n. 
Proof. We prove by induction that the nth derivative of f is 


(6.5.1) zon 


When n = 1, this just says that 


£'(e) = So ker (a—a)h1, 
k=1 


which is true by Theorem 6.4.12. 


If we assume that (6.5.1) is true for a given n, then by differentiating it and 
again using Theorem 6.4.12, we obtain 


are ae 
SOM (a) = XS Gai" —n)ex(a— ahr} 
ken ¥ 
ce ke! Bak 
= 3 peace ayn. 
kent : 


Since this is (6.5.1) with n replaced by n+ 1, the induction is complete and we 
conclude that (6.5.1) is true for all n € N. 


If we set = a in (6.5.1), all terms vanish except for the first one (the one 


where k =n). Thus, 


o 


fa =nlen OF Ch= 


Taylor’s Formula. The previous theorem tells us that the only power series, 
centered at a, that could possibly converge to f(a) in an interval centered at a is 
the power series 


(6.5.2) 


This is called the Taylor series for f ata. The nth partial sum of this series, 


"aq , (a 
f(a) + f'(a)(« — a) + mo Ye — a)? yg i‘ Vw — a)", 

T nm 
is called the nth Taylor polynomial for f at a. The function f is analytic at a if and 
only if the sequence of Taylor polynomials for f converges to f in some open interval 
centered at a. Taylor's formula helps decide whether this is true by providing a 
formula for the remainder when f is approximated by its nth Taylor polynomial. 


6.5. Taylor’s Formula 155 


Theorem 6.5.3 (Taylor’s Formula). Let f be a function which has continuous 
derivatives up through order n +1 in an open interval I centered at a. Then, for 
each x € I, 


f ae , = LOG) ove es 
(6.5.3) S(@) = F(a) + f'(a)(@ 0) +--+ =e ~ a)" + Ral), 
where 

: Be sare eee 
(6.5.4) R,(z) = ay ay), 


for some c between a and x. 


Proof. This theorem is reminiscent of the Mean Value Theorem. In fact, in the 
case n = 0, it is the Mean Value Theorem. It is not surprising that its proof mimics 
the proof of the Mean Value Theorem. 


We set 


£™(a) 


Ro (2e) = f(@) = f(a) = f'(a)(w =a) == 


nt 
so that (6.5.3) holds. We then define a function s(t) on I by 


FOO, 


n! 


ot) = fla) = (= F(Ole= 9 = 


Then s(a) = s(a) = 0, and so there must be a critical point ¢ for s somewhere 
strictly between a and a. Since s is differentiable on J, this critical point must be a 
point where s’ is 0 — that is, s‘(c) = 0. In the calculation of s’, all the terms cancel 
except two at the very end, leaving 


(n+) (@ yr 
0= 3'(e) =e 0" + (nt Rule oa 
Equation (6.5.4) follows from this when we solve for Rn(z). oO 


The function Ry(x) in the above theorem is called the remainder for the nth 
degree Taylor formula. 


Example 6.5.4. Find the Taylor series expansion of e® at 0 and tell for which 
values of « this expansion converges to e”. 

Solution: The function e® is infinitely differentiable on R with kth derivative 
equal to e® for all x. Thus, its kth derivative evaluated at 0 is 1 for all k. Taylor’s 
formula then tells us that 


a a” 
ealtatst+ + F+Rlz), 
2! nl 
where 
antl 
R, (x) =e ‘ 
mle) 8" GSE DI 


for some ¢ between 0 and a. 


156 6. Infinite Series 


For all values of a and c, lim e® 


(n+1)! 
the Taylor polynomials for e® converge to e® for all « € R — that is, the Taylor 
series expansion 


(6.5.5) 


is valid for all« ER. 


Example 6.5.5. Find the Taylor series expansion of sina at 0 and tell for which 


values of « this expansion converges to sin«. 


Solution: The function f(«) = sina is infinitely differentiable on R and its 
first four derivatives are 


f(x) =cosa, f"(e)=-sinz, fl"(x)=-cosa, f(x) =sina. 
Since f“ = f, taking nth derivatives leads to f(*4) = f( for every non-negative 


integer n. Thus, at 0, the sin and its derivatives form the following repeating 
sequence with period 4: 


0,1,0,—1,0,1,0,-1,0,.... 


Hence, Taylor’s formula for sina at « = 0 is 


gent 
sot (O" Gari! + Ron+2(x), 
where 
' gents 
Ranya(a) = sin (ay for some c. 
in +3)! 
The reason we use Ry,+9(a) rather than Ry,41(a) for the remainder (they are 


actually equal, since the term of degree 2n + 2 is 0 in Taylor's formula for sin x) is 
that we get better estimates on the size of the remainder if we use Ry, +2(«). 


Since |sin@"*9)(c)| < 1, we have 


|Ran+a(a)| < 


which implies that the remainder has limit 0 for all « (see Exercise 6.5.1). Thus, 
the Taylor series expansion 

gent K 
Grp! 


sina 


is valid for alla ER. 


Example 6.5.6. Find an estimate for the error if sin x is approximated by «—23/3! 
for a in the interval [-7/4, 7/4]. By an estimate for the error, we mean an upper 
bound for the error which is as close to the actual error as possible without going 
to extraordinary effort. 

Solution: By the previous example, the difference between sin x and its third 
degree Taylor polynomial has absolute value less than or equal to 


ys 


<.003 for —a/4<a<n/4 


5! 


a 


6.5. Taylor’s Formula 15 


Lagrange’s Form for the Remainder. The following integral formula for the 
remainder in Taylor’s formula sometimes leads to better estimates on the size of 
the remainder than does the usual form. 


Theorem 6.5.7. If f is a function with continuous derivatives up through order 
n+1 on an open interval I containing a and x, then the remainder Ry(x) in Taylors 
formula for f at a can be written as 


(6.5.6) R,(«) = af 


x —t)" f(t) dt. 


Proof. We prove (6.5.6) by induction on n with the base case being n = 0. In the 
case where n = 0, Taylor’s formula is 
f(x) = f(a) + Ra(x) so that Re(x) = f(x) — f(a). 
a 


Equation (6.5.6) in this case is 


fe) — f(a) = [row 


which is just the Fundamental Theorem of Calculus. Thus, (6.5.6) holds when 
n=0. 

For the induction step, we assume (6.5.6) holds for a given n and proceed to 
prove that it then holds for n + 1. If we apply integration by parts to the integral 
on the right side of (6.5.6), the result is 


fN(e) 1 « o ” 
Rote) a Oe atta 1 Peg yet ayae. 
ne) = teats a fe- arts oat 
FO) (a) 
Since Rn41 (2) = Ra(x) — Gay —a)"*!, this proves that (6.5.6) holds with 
+ 
ar replaced: by. 70-4 1ssthus completing the dnduction step: o 


Example 6.5.8. Find a power series expansion for f(«) = (1+)? which is valid 
on (=1, 1), where p is any real number. 


Solution: The derivatives of f are 
p(L+e), p(p—W(L+e), .., p(p— 1) (pent (lee, 


The nth derivative evaluated at 0 is p(p—1)-+-(p—n+1). Thus, Taylor's formula 
for f is 


(+2)? =1+pxe+ e- Dory. p PR epans Dom + RA(2), 


where 


(p= 1)-+(p—n) pf? (e=0" 
R(x) = 2 [ dt 
e 

( 


nl T+ot 
if we use Lagrange’s form of the remainder. However, since ¢ is between 0 and x, ¢ 
and « have the same sign, and this implies that 


158 6. Infinite Series 


(Exercise 6.5.9). From this, we conclude that 
“P= Mice [ee at 
JO 


2 
This is just the constant ‘4 (1+4)?"1 dé times the absolute value of the nth term 
0 


|Ra(2)] < p(p—1)-- 


in the power series 


=1)-«:(p—nt1 
Sener ot De @=nt) in, 


(6.5.8) l+pr+? ; 
ml 


(p= 1) 
2 


which happens to be the Taylor series for (1 + 2)? at @ If we can show that this 
series converges when |x| < 1, then the term test implies its sequence of terms 
converges to @ and, by the above, this shows that the remainder Rn(z) converges 
to @ and, hence, that this series converges to (1 +x)? when || < 1. 
We prove that (6.5.8) converges on (1,1) by using the ratio test. For the 
absolute value of the ratio of term n+ 1 to term n, we get 
pon 
n+l 
which has limit || as n+ 00. Hence, the series (6.5.8) converges for |x| < 1 and it 
converges to (1+ x)?. 


The Taylor series for (1 + x)?, as derived above, is called the binomial series 
with exponent p. Note that when p is a positive integer, the series (6.5.8) terminates 
at n = p; that is, all terms with n > p are zero and Taylor’s formula for (1+)? at 
0 with n > p is just 


pP(p—1)---(p—~ t+) 


agP 
pl 


tet 


(L+2)? = Lt pe + PPX 1)? 


Pp 
P: 
cra 


which is the binomial formula (Theorem 1.2.12) with a = 1 and b = z. The binomial 
formula for general a and b can be deduced from this (Exercise 6.5.14). 


[Se 
Exercise Set 6.5 


a 
1. Prove that lim “_ = @ for all =. 
nl 
2. Find the Taylor series expansion for cos.r at 0 and show that it converges for 
all x. 
2 


3. Use Taylor's formula to estimate the error if cos is approximated by 1 — 
on the interval (~.1, -1]. 


4, What is the smallest n for which we can be sure that 


is within .001 of e? 


6.5. Taylor’s Formula 159 


Naa 


10. 
11. 


12. 


13. 


. What is Taylor’s formula for the function f(a) = /I+ with a = 0? 


3 7? — da +4 with a= 1? 


What is Taylor's formula for the function f(«) = 


. What is Taylor’s formula for In(1 + x) with a = 0. Compare with Example 


6.4.11. 


. Use the binomial series with p = —1/2 to get a power series expansion for 


valid on (—1,1). Use this to get power series expansions first for 
z 


and then for arcsin x on this same interval. 


1 
vV1-2? 


. Prove that if x € (-1,1) and ¢ is between 0 and « (so that £ and « have the 


same sign and |t| < |x| < 1), then 


Verify the computation of s’ given in the proof of Theorem 6.5.3. 


Prove that if f is an infinitely differentiable function on (a—r,a+r) and there 
is a constant A’ such that 


1 
(2) ()| < Ke 
FMM SAG 
for alln € N and all x € (a—r,a+r), then the Taylor series for f at a converges 
to fon (a—rja+r). 


1/2? 
= 0 for every n. 


Use L’H6pital’s Rule to show that. lim, 2 
zs 


If g(x) =e-1/*" for x #0 and g(0) = 0, show that g is infinitely differentiable 
on the entire real line but all of its derivatives at 0 are equal to 0. Argue that 
this means that g cannot be analytic at 0. Hint: Use the previous exercise to 
help compute the derivatives of g at 0. 


. Prove that the binomial formula (Theorem 1.2.12) for a general a and b follows 


from the Taylor series expansion (6.5.9) of (1 + «)P for p a positive integer. 


. Give a new proof that e” e” = e*Y by using the Taylor series expansion for e* 


(6.5.5) and the product formula of Theorem 6.3.6. You will also need to use 
the binomial formula. 


—eesy 
Chapter 7 


Convergence in 
Euclidean Space 


With this chapter we begin our study of calculus in several variables. The first. task 
is to define R¢ — Euclidean space of dimension d. We will then study convergence of 
sequences of points in this space and introduce the concepts of open and closed sets. 
These are generalizations to R¢ of the concepts of open and closed intervals in R. 
In the final two sections we introduce the concepts of compact sets and connected 
sets. These are also generalizations to R@ of properties of intervals in R. These 
ideas will be of fundamental importance when we study continuous functions on 
R? in the next chapter. 


In order to define and study convergence and continuity, we don’t need to use 
all of the properties of R¢ — only the ones derived from the concept of distance 
between points. A set together with a well-behaved notion of distance between 
pairs of points is called a metric space. In the coming pages, we will give a more 
precise definition of metric space and will point out how many of the definitions 
and theorems we develop in this chapter are valid, not only in R4, but in any metric 
space. 


7.1. Euclidean Space 


The space R¢ is defined to be the set of all d-tuples of real numbers, where, by a 
d-tuple of real numbers, we mean an ordered set (1, 2,..., 2a) of d real numbers. 
It is ordered because the numbers are listed in a certain order and, if this order is 
changed, then the new d-tuple is different from the old one (unless the change of 
order just interchanges identical numbers). For example, (5,0,7) and (0,5,7) are 
different 3-tuples and, hence, different points of R°. 

The spaces R? and R® are familiar from calculus. The space R? is the set of 
all ordered pairs (21,2) of real numbers, while R® is the set of ordered triples 


161 


162 7. Convergence in Euclidean Space 


3) of real numbers. Often points of R? are denoted (2, y) rather than 
22) and points of R® are denoted (x,y, 2) rather than (ay ). 


The Vector Space R¢. We will often refer to a point of R@ as a vector in R%, 
while a point of R will often be referred to as a scalar. 

There are natural operations of addition of vectors in R¢ and multiplication 
of vectors by scalars. That is, if = (21,22,...,24) and y = (t1,¥2,-+<,ya) are 
vectors in R@ and a is a scalar, then we set 


y = (a1 + y1,%2 + y2,---,2a + ya) 


and 


ax = (ar},ar2,..., a4). 


The zero vector (also called the origin of R®) is the vector 


8). 


Note that we use the same symbol, @, to stand for both the scalar @ and the vector 
@€ R®. This shouldn't cause any confusion, since it will always be obvious from 
the context which is meant. 

( xa) in R¢, the components of x are the numbers 
x2,...,q- The jth component is the number x}. Two vectors are identical if 
and only if their jth components are identical for j = 1,2,...44. 


Given a vector # = (21, 22,.. 


As noted in the next theorem, addition in R®@ satisfies the associative and 
commutative laws and @ has the appropriate properties. Also, scalar multiplication 
satisfies an associative law and two distributive laws. 


Theorem 7.1.1. Let u,v, w be points of R® and let a and b be real numbers. Then 


(a) u+(u+w) =(u+v)+u; 
(b) utvu=v+u; 

(c) @+u=u; 

(d) @u= 0 and lu =u; 


) 
) 
) 
(e) a(bu) = (ab)u; 
) 
) 


c 


(f) (a+b)u = au + bu; 
(g) a(u+v) =autav. 


Proof. Each statement asserts that two vectors are identical. Thus, each can be 
proved by showing that the jth components of the two vectors are identical for each 
j. In each case, this follows immediately from the definitions and the fact that R 
satisfies the field axioms Al — A4, M1 — M4, and D (see Section 1.3). a 


A set together with operations of addition and scalar multiplication (where 
the scalars belong to some field F), satisfying the properties listed in the above 
theorem, is called a vector space over F (see Section 1.3 for the definition of a 
field). Hence, R? is a vector space over the field R. 


Using only the vector space axioms listed in Theorem 7.1.1, one can easily 
derive all of the properties of general vector spaces. 


7.1. Euclidean Space 163 


Example 7.1.2. Using only the properties listed in Theorem 7.1.1, prove that if 
x is an element of a vector space, then (—1)x is an additive inverse for x. That is, 
prove that x +(—l)x =0. 


Solution: By Theorem 7.1.1(d) and (f) we have 
@+(-l)e=(1+(-1))c =0c = 


In view of this example, (1) is an additive inverse for x and so it makes sense 
to denote it simply 


=a: 


Other properties of vector spaces will be derived in the exercises. 


Inner Product. 


Definition 7.1.3. The Euclidean inner product of two vectors u = (ui,..., ta) 
and v = (v1,-..,vg) in R¢ is the real number 
(7.1.1) uv = Ur + ugyg +++ + ugva- 


This has the following simple properties. The proof is left to the exercises. 
Theorem 7.1.4. If u,v,w € R® and a€R, then 
(ayo Sw: 
( 
( 


(d) u-u>0 unless u = 0 in which case u-u= 0. 


b) (utv)-w=u-wtv-u; 


) 
©) (au) -v = a(u-v); 
) 


More generally, a function from pairs of vectors to scalars which satisfies (a) 
through (d) above is called an inner product on the vector space. A vector space 
together with an inner product on that vector space is called an inner product space. 
Thus, R¢ is an inner product space with the inner product described in Definition 
7.1.3, 


There are other inner products on R*. For example, if each term ujvy in (7.1.1) 
is replaced by ajujv), Where a1,...,a¢ are positive scalars, then the resulting sum 
defines a new inner product, which is different from the original unless all the a;'s 
are 1. In this text, the only inner product on R® that we will use is the Euclidean 
inner product as defined in (7.1.1). 


Using (a) and (c) of Theorem 7.1.4, we easily show that w+ (av) = a(u-v). 
Thus, for a scalar a and vectors u and v, there is no ambiguity if we simply write 
au-v in place of any one of the three equal products 


a(u-v), (au)-v, w-(av). 


Example 7.1.5. If X is an inner product space, x,y € X, and a,b € R, then 
calculate the inner product of ax + by with itself. 


Solution: By (b) and (c) of the previous theorem, we have 
(ax + by) «(ax + by) = ax + (ax + by) + by - (ax + by). 
By (a), (b), and (c) we have 
x (ax + by) =a?x- a+ abe-y, 
by - (ax + by) = abr -y + b?y-y. 


164 7. Convergence in Euclidean Space 


Combining these yields 


(ax + by) + (ax + by) = a?x- a@ + abs + y + b?y- y. 


Components of a Vector. We will typically denote by e; the vector consisting 
of the d-tuple with all entries @ except for the jth entry which is 1. Thus, ej = 


(0,0,...,0,1,0,...,0) with the 1 occurring in the jth position. Note that 
ej ek = jk, 
where 6;, is 1 if j = k and it is @ otherwise. This means that {ej} is an 


orthonormal set in R*. 


Note that if « = («1,«2,...,a) € R4, then the jth component «; of x is given 


by aj =a-ej forj=1,... 


Example 7.1.6. Show that each vector in R@ is a unique linear combination of 
the vectors e; for 


Solution: If « = (1, r2,...,7a), then 
d 


Yi (#-ej)e;. 


j=l 


This is one way of expressing « as a linear combination of the ¢;’s. On the other 
hand, if 


is any such linear combination, then for k = 1,...,d, 
d 


Te= Te, = y ajej ek = an, 


since e;-€, = 1 if j =k and it is @ otherwise. Thus the coefficients aj must be the 
numbers 2). 


Norm and Distance. 


Definition 7.1.7. In an inner product space, we define the norm |||] of a vector 
x to be the number 


Ile|| = Va-e. 


The distance between two vectors x and y is defined to be ||x — y||. 


Note that, by Theorem 7.1.4(d), the norm of a vector is always non-negative 
and it is zero only if the vector is the zero vector. Thus, the distance between two 
vectors is always non-negative and it is zero if and only if the vectors are equal. 

In calculus, it is often shown that for two vectors u and v in R? or R® the inner 
product: satisfies 

uv |lull [lellcosd, 
where @ is the angle between u and v. Since |cos6|< 1, this implies that 


peo] < fel ffell 


7.1. Euclidean Space 165 


‘As we show below, this inequality is true in R¢ and, in fact, in any inner product 
space. In this generality it is known as the Cauchy-Schwarz inequality. 
Theorem 7.1.8 (Cauchy-Schwarz Inequality). If X is an inner product space, 
then 

luo} s [fell Tell 
for all uve X. 


Proof. If we take the inner product of a vector with itself, the result is non-negative 
by (d) of Theorem 7.1.4. Thus, if u and v are vectors in X and t € R is a scalar, 
then 


at? + Wt +e, 


@< (tut): (tute) =u-ut2tu-vtu-v 


where a =u-u=|lul[?, 6 =u-v, and c=v-v = |[v||?. The expression on the right 


is a quadratic function of t which is never negative. This means that the quadratic 
equation 

at? +%wt+ce=0 
has at most one real root (since the graph of at? + 2bt + ¢ cannot cross the t-axis). 
By the quadratic formula, the roots of this equation are 


b+ Vi ae. 


Since there cannot be two real roots, it must be the case that 6? < ac. On taking 
the square root of both sides of this inequality, we obtain the inequality of the 
theorem. a 


Let wand v be vectors in an inner product space. In view of the above theorem, 
wo, ie . 

the number hae ® always between —1 and 1 and, hence, it is the cosine of some 
hal|ije 

angle 0 with @< 0 < x. This leads to the following extension to arbitrary inner 


product spaces of the notion of the angle between two vectors. 


Definition 7.1.9. With u,v, and 0 as above, we will call 0 the angle between u 
and v. This angle is x/2 if and only if uv = 
v are mutually orthogonal and write wu L v. 


. In this case we will say that wu and 


The Triangle Inequality. The triangle inequality is just the vector space version 
of the statement that the length of one side of a triangle is always less than or equal 
to the sum of the lengths of the other two sides. It is stated more precisely in part 
(a) of the following theorem. 


Theorem 7.1.10. If X is an inner product space, x,y € X, and a€ R, then 
(a) [le +l < [ell + Ilys 
(b) |lax|| 
(c) ||e|| =@ implies « = 0. 


| lle; 


Proof. Using Example 7.1.5 and the Cauchy-Schwarz inequality, we have 
lle + yl? = (@ ty) (w+ y) = |le]? + 2e-y + |ly||? 
2 {2 bs (2 * 2 
< Wel + 2} el [lvl + yl = (ell + Te ID 


166 7. Convergence in Euclidean Space 


Part (a) of the theorem follows from this on taking square roots. Parts (b) and (c) 
follow from (c) and (d) of Theorem 7.1.4. o 


Suppose u, v, and w are points in a vector space X. Then ||u—v||, [|v — w||, 
and ||u—w]| are the lengths of the sides of the triangle with vertices at u, v, and w. 
If we apply part (a) of the previous theorem to the vectors « = u—v and y = v—w, 
the result is the inequality 


(7.1.2) lu — wl] < |Ju— vl] + |lv — wl], 


that a side of a triangle always has length less than or equal to the sum 


which si 
of the lengths of the other two sides. 


Norms in General. The norm induced by an imner product is just one type of 
norm on a vector space. In general, a norm on a vector space X is a non-negative 
function ||- || which satisfies (a), (b), and (c) of the previous theorem. A normed 
vector space is a vector space X together with a norm on X. There are norms on R® 
which are different from the Euclidean norm (the norm induced by the Euclidean 
inner product). 


Definition 7.1.11. If « = (x1,72,...,a) € R¢, we set 
(1) [fells = fer] + fea] +--+ [als 


(2) |[e|[oe = maxf{|er|, |ea|,--., eal}. 


Example 7.1.12. Show that ||: || is a norm on R*. 
xq) and y = (41, 42,---, Ya), then 


Solution: If x = (x1, r2,. 


n 


z 
let yl = Do hes + yl S Do (eal + ly), 
j=l j=l 
by the triangle inequality for R. The sum on the right is equal to 


d d 
De lesl + So lol = belle + lly lh 
j=l j=l 


Thus, ||- ||; satisfies the triangle inequality ((a) of Theorem 7.1.18). 
Ifa eR, then 


d d 
llax|hs = D2 Jaws| = D7 Jal fey] = lal llelh- 
j=l j=l 
Thus, ||- | 1 also satisfies (b). That (c) holds as well is obvious, since |le|]1 = @ 
implies that 2j = @ for each j and, hence, that x = @. 


We leave to the exercises the problem of showing that ||- oe is also a norm on 
R¢ 
Theorem 7.1.13. The three norms we have defined on RY are related as follows: 
UW, : : ; 
a *YIe|h1 S|[2I|oe < |lell < [ell 
for each x € RA. 


The proof of this is also left to the exercises. 


7.1. Euclidean Space 167 


The Normed Vector Space C(/). In mathematics we deal with a great many 
normed vector spaces. One that does not look at all like R¢ is the space C(/), where 
I is a closed bounded interval on the real line and C(J) is the vector space of all 
continuous real-valued functions on J. Addition is pointwise addition of functions 
and scalar multiplication is multiplication of a function by a constant. It is easy to 
see that C(I) is a vector space under these two operations (Exercise 7.1.10). There 
are many norms that can be put on this vector space, but perhaps the most useful 
is the sup norm, |] |loe, defined by 


(7.1.3) IL] \oe Sap \f(2)], 


for f € C(I). The problem of showing that this is a norm is left to the exercises. 


———————SEE aaa 
Exercise set 7.1 


1. For the vectors x = (1,0,2) and y = (—1,3, 1) in R® find 
(a) 2a +y; 

(b) 2+ y; 

(c) |fe|| and [lyf 

(d) the cosine of the angle between « and y; 

(e) the distance from « to y. 


2. Using only the properties listed in Theorem 7.1.1, prove that if u,v,w are 
vectors in a vector space and u+w=v-+w, then u=v. 

3. Using only the properties listed in Theorem 7.1.1, prove that if w is a vector in 
a vector space, a is a scalar, and au = 0, then either a = 0 or u=0. 


4, Prove Theorem 7.1.4. 


a 


. Prove the second form of the triangle inequality. That is, prove that 
IIkell = [lvl < te = gl 
holds for any pair of vectors x,y in a normed vector space. Hint: Use the first 


form (Theorem 7.1.10(a)) to prove the second form. 


6. Prove that equality holds in the Cauchy-Schwarz inequality (Theorem 7.1.8) if 
and only if one of the vectors u,v is a scalar multiple of the other. 


7. For a norm on a vector space X, defined by an inner product as in Definition 
7.1.7, prove that the parallelogram law, 
lle + yll? + lle — yl? = 2lfe? + 2lIP, 
holds for all x,y € X. 
8. Prove that ||; |Jse, as defined in Definition 7.1.11, is a norm on R¢. 
9. Prove Theorem 7.1.13 
10. Prove that the space C(I), defined in the previous subsection, is a vector space. 


11. Prove that the sup norm as defined in (7.1.3) is really a norm on C(J). 


168 7. Convergence in Euclidean Space 


12. Prove that if {x,} and {yx} are sequences of real numbers such that 


oe 


a a 
Viai<co and Soy <co, then 7 |xxye| <ov. 
ae k=1 RaE 


Hint: What can you say about the corresponding finite sums? 
13. Find a non-zero vector in R3 which is orthogonal to both (1,0, 2) and (3,1, 1). 


14. Prove that if u and v are vectors in an inner product space and wu. v, then 
|e + vl? = [lel]? + le lP- 


7.2. Convergent Sequences of Vectors 


In this section we study convergence of sequences of vectors in R¢. The definitions 
and theorems in this topic are very similar to those of Chapter 2 on sequences of 
numbers. 


Metric Spaces. As long as we are working in a space with a reasonable notion 
of distance between points, we can define and study convergent sequences and 
continuous functions. Such a space is called a metric space. The precise conditions 
for a space to be a metric space are defined below. 


Definition 7.2.1. Let X be a set and let 6 be a function which assigns to each 
pair (x,y) of elements of X a non-negative real number 6(«,y). Then 6 is called a 
metric on X if, for all x,y,2 €X, the following conditions hold: 


(a) d(x,y) = 5(y, 2); 
(b) 6(x, y) = 0 if and only if x = y; and 
(c) d(x, 2) < d(x, y) + 5(y, 2). 


A set X together with a metric 6 on X is called a metric space. The number 6(a 
is the distance between x and y in this metric space. 


y) 


Conditions (a) and (b) above are called the symmetry and identity conditions, 
while condition (c) is the triangle inequality for metric spaces. 


We will show that R¢ is a metric space, as is any normed vector space. 


Theorem 7.2.2. If X is a normed vector space, then X is a metric space if its 
metric 6 is defined by 

S(x,y) = |e — yl 
In particular, R¢ is a metric space in the Euclidean norm, as is C(I) in the sup 
norm. 


Proof. Parts (a), (b), and (c) of Theorem 7.1.10 are satisfied by the norm in any 
normed vector space. Part (b) with a = —1 implies that ||2 — y|| = ||y— || and 
ymmetric. Part (c) implies that || — y|| = 0, if and only if « = y, and 
(a) implies (7.1.2), which shows that 5 
satisfies the triangle inequality. Thus, 6 is a metric on X. oO 


so 6 is 


so 6 satisfies the identity condition. P; 


7.2. Convergent Sequences of Vectors 169 


Remark 7.2.3. If X is a metric space with metric 6 and Y is any subset of X, 
then Y is also a metric space with the same metric 5. Thus, any subset of R? is 
also a metric space if it is given the usual Euclidean metric. 


There are a great many metric spaces other than subsets of R@ that are impor- 
tant in mathematics. We will explore some of these in the exercises. 


Remark 7.2.4. The following statements summarize the relationship between the 
types of spaces we have introduced so far: 
(1) R@ is an inner product space; 
(2) every inner product space is a normed vector space, with norm defined by 
[lel = Ve-a; 
(3) every normed vector space is a metric space, with metric defined by 6(«,y) = 


|| — gll- 


Sequences. The definition of convergence for a sequence {r,} in R@ should 
look familiar: 


Definition 7.2.5. If {,} is a sequence of vectors in R¢ and x € R%, then we s 
{an} converges to « if for every ¢ > 0 there is an N € R such that 


In this case, we write limyjoo%n = @ or lima, = # or simply aq 4 2. 


—x,||<e whenever n>N. 


Note that we do not require the NV’ that appears in this definition to be an 
integer. 

Note also that the only thing we use about R@ in making this definition is the 
notion of distance between points in R¢. Quite clearly, the same definition can be 
made for any metric space X if we just replace |e —an|| by 5(«,2n), where 6 is the 
metric on X. Thus, the definition of convergence for a sequence in a general metric 
space is the following: 


Definition 7.2.6. Let X be a metric space with metric 6. If {en} is a sequence in 
X and « € X, then we say {an} converges to x if for every € > 0 there isan NER 
such that 

5(#,2n) <e€ whenever n> N. 


In this case, we write limp jo @% = « or lima, = a or simply &, + @. 


We will not try to prove everything in this section in the context of general 
metric spaces; after all, the object of study here is R¢. However, we will point out 
some theorems that we prove for R¢ that can be proved in general metric spaces 
or normed vector spaces or inner product spaces, and some of the exercises will be 
devoted to verifying these claims. 


Example 7.2.7. Let 7, = (1/n?,1+1/n) € R®. Use Definition 7.2.5 to prove that 
the sequence {x} converges to x = (0,1). 


Solution: We have « — a, = (—1/n?,-1/n) and so 


|e —an|| = V/1/nt + 1/n? < \/2/n? = V2/n. 


170 7. Convergence in Euclidean Space 


Thus, given € > 0, if we choose N = V2/e, then 


lc — nl] < V2/n < V2/N =€ whenever n> 
This completes the proof that lima, =. 


Many limit proofs for sequences in R@ follow the same pattern as in the above 
tp|| < V2/n and then used the fact that V2/n 
can be made less than ¢ by making n large enough ~ that is, we used the fact that 
lim V2/n = 0. We can save some effort in future proofs by formalizing in a theorem 
the method that was used here. The theorem is a vector version of Theorem 2.3.1. 
In fact, it follows immediately from Theorem 2.3.1 and the fact (obvious from the 
definition of limit) that lima, = a if and only if lim||2, — 2|| = 0. 


example. We showed that ||x 


Theorem 7.2.8. Let {xn} be a sequence in R and let x be a vector in R¢. If there 
is a sequence {an} of non-negative real numbers such that 


||c—an|| San for all n 


and if limay =0, then lima, = 2. 


Note that, since the proof of this theorem uses nothing about R* but the 
existence of a metric and the definition of limit, the theorem holds in any metric 
space (if ||2 — 2, || is replaced by 6(z, an). 

Example 7.2.9. If 2, = (e-"sinn,e-”) € R’, prove that limx, = 0. 

Solution: We have 


|larn — Ol] = ||arn|] = yfe-2n(sin? n + 1) < 2e7" = 2/e". 


Since lim 2/e” = 0, the previous theorem tells us that lima, = 0. 


Limit Theorems. The following theorem says that the limit of a sequence, if 
it exists, is unique. Its proof is identical to the proof of Theorem 2.1.6. We won't 
repeat it here. The analogous theorem for metric spaces is also true and also has 
the same proof. 


Theorem 7.2.10. If {x,} is a sequence in R® and x,y € R? with a, — x and 
@, > y, then ax 


Theorem 7.2.11. [f lima, =x for a sequence {xn} in R4, then lim |lxn|| = |le||- 


Proof. The second form of the triangle inequality tells us that 
[lel = [lealll S [la = onl. 


If lima, = 2, then the sequence of numbers on the right converges to 0. It follows 
that the one on the left also converges to 0. Thus, lim |zn|| = |[||- Oo 


Next is the vector version of the Main Limit Theorem (Theorem 2.3.6). 


7.2. Convergent Sequences of Vectors ial 


Theorem 7.2.12. If {xn} and {yn} are sequences of vectors in R¢ and an is a 
sequence of scalars and if tn + x € R¢, y, 3 y € R¢, and an a, then 


(a) tunt+yn uty; 
(b) anan 4 ax; and 
(c) tn Yn wy. 
Proof. (a) By the triangle inequality, we have 


le + y — (an +yn)|| < lle 


nll + |ly— yall: 

Since rn + @ and yn — y, we have that || —2n|| 4 0 and |ly — yn|| +0. Thus, 

|x —anl| + |ly—yn|| 4 0 and it follows from Theorem 7.2.8 that an +yn 3 ety. 
(b) We have 


lax — anan|| = [la(e — an) + (a an)anl| < [a] |x — en|| + fa = an| [lal]. 
Since || —an|| + 0, ja—an|— 0, and ||xn||— ||2|| (by the previous theorem), the 
expression on the right converges to 0. By Theorem 7.2.8 again, limanan = az. 


(c) The proof of this is similar to the proof of (b). The details are left to the 
exercises. a 


Note that the proofs of (a) and (b) above use only properties of R¢ that are 
also true in any normed vector space, and so they hold in this much more general 
context. The proof of (c) uses only properties of R¢ that hold in any inner product 
space and so (c) is true in any inner product space. 


The next theorem tells us that a sequence of vectors converges if and only if it 
converges componentwise. 


Theorem 7.2.13. A sequence {xn} in R¢ converges to x € R¢ if and only if each 
component of {an} converges to the corresponding component of x — that is, if and 
only if liman +e; = «+e; forj=1,...,d. 


Proof. If limnsoo@n = 2, then limy joo @n ej = @ + ey for each j by Theorem 


7.2.12(c). 


To prove the converse, we suppose litin joo tn « ej = 
that this implies that limp soo |(an 


+ ej for each j. We note 
0 for each j. We have 
42 


lan = 


Each term in the sum on the right converges to 0 and, hence, the sum and its square 
root also converge to 0. We conclude that lima, = a 


Bolzano-Weierstrass Theorem. A version of the Bolzano-Weierstrass Theorem 
(Theorem 2.5.5) holds for bounded sequences in R¢, where a sequence in R¢ is 
bounded if there is a number M such that ||an|| <M for all n. 


Theorem 7.2.14 (Bolzano-Weierstrass Theorem). Each bounded sequence in 
R? has a convergent subsequence. 


172 7. Convergence in Euclidean Space 


induction on the dimension d of the Euclidean space. 


Proof. We will prove this b 
It is, of course, true for d = 1 by the single variable version of the Bolzano- 
Weierstrass Theorem (Theorem 2.5.5). 


Suppose @ > 1 and the theorem is true for Euclidean space of dimension d—1. 
Let {2} be a bounded sequence in R¢. Then there is an M € R such that ||2n|| < 
M for all n. 

We identify R¢ with the Cartesian product R¢~! x R. This is the space of all 
pairs (y, z), where y € R¢! and z € R. That is, if ¢ = (x1,...,¢a) € R%, then we 
identify with the pair (y,z), where y = (x1,@2,...,ta_1) and z = 2g. If this is 
done, notice that 

llvll S [lel] and [2] < |e 
Thus, if we write each element of the sequence {an} in the form an = (Yn, 2n) € 
R¢! x R, then |lyn|| < ||| <M and |zn| < ||en|| <M. This implies that the 
sequences {yn} and {zp} are both bounded. 

By the induction assumption, the sequence {y,} has a convergent subsequence 
{yn,}. The corresponding subsequence {2n,} of the sequence {zn} is still bounded, 
and so it has a convergent subsequence. By replacing {yn,} with a (still convergent) 
subsequence of itself, we may assume that {2n,} itself converges. 


The component sequences of {z»,} are those of {yn,}, which all converge since 
{yn,} converges, and the sequence {2,,}, which converges. Thus, {an,} converges 
since all of its component sequences converge. 

We conclude that every bounded sequence in RY has a convergent subsequence. 
This completes the induction and finishes the proof of the theorem. a 


Cauchy Sequences. Cauchy sequences in R@ are defined in the same way that 
Cauchy sequences of numbers were defined in Definition 2.5.7. 


Definition 7.2.15. A sequence {1} in R¢ is said to be a Cauchy sequence if, for 
every € > 0, there is an N such that 


|| —tm||<€ whenever nym>N. 


The following theorem is proved using the Bolzano-Weierstrass Theorem in 
exactly the same way 
proved. We won't repeat the proof. 


that its single variable counterpart (Theorem 2.5.8) was 


Theorem 7.2.16. A sequence {xn} in R¢ is a Cauchy sequence if and only if it 
converges. 


To prove directly from the definition that a certain sequence converges, it is 
necessary to have in hand the element to which it converges. On the other hand, the 
definition of a Cauchy sequence involves only the elements of the sequence. Hence, 
the above theorem provides a way to prove that a sequence converges without 
having already identified the limit. 


Clearly, Cauchy sequences can be defined in any metric space ~ simply replace 
“Ie, —am||” in the above definition by “6(an, 2m)”, where 6 is the metric. However, 


the analogue of Theorem 7.2.16 is not true in general for metric spaces. A metric 


space in which it is true is said to be complete. Thus, R’ is a complete metric space. 
‘An example of a metric space which is not complete follows. 


7.2. Convergent Sequences of Vectors 173 


Example 7.2.17. Let the interval (0,1) be considered a metric space with the 
usual distance between points as metric. Show that this is not a complete metric 


space. 


Solution: The sequence {1/n} is a Cauchy sequence since it converges in R 


to the point 0. However, since 0 ¢ (0,1), this sequence does not converge in the 
metric space (0,1). Hence, (0,1) is not a complete metric space. 


I 


a ee 2 
Exercise Set 7.2 


Using only the definition of the limit of a sequence in R¢ prove that 


ri 
lim (—"_, ——" ) = (1,-1). 
l+n’ on 


In each of the next four problems, decide if the sequence {2} converges and find 
its limit if it does. Use limit theorems to justify your answers. 


— _ (Wtn-1n-1 
on \ Bre +2 n+l)" 


(1+ (-1)",1/n,1+4 1/n). 


In 


| ay = (2-" sin(n/4), 2- cos(n7/4)). 
. tm = (In(n +1) —Inn,sin(1/n)). 
. Let {an} and {y,} be sequences in R¢. Prove that if lima, =0 and if {yn} is 


bounded, then lim ay, «yn = 0. 


. Let {xn} be a bounded sequence in R4 and let a, be a bounded sequence of 


scalars. Prove that if either sequence has limit 0, then so does the sequence 


{anv}. 


8. Prove that every convergent sequence in R¢ is bounded. 


9. [fay = (sinn,cosn, 1+(—1)"), does the sequence {an} in R3 have a convergent 


10. 
11. 


12. 


13. 
14. 


subsequence? Justify your answer. 
Prove part (c) of Theorem 7.2.12. 


If a, = (1/n,sin(7n/2)), find three convergent subsequences of {x,,} which 
converge to three different limits. 

If, for z,y € R, we set 5(x,y) =0 if x = y and 6(a,y) = 1 if x F y, prove that 
the result is a metric on R. Thus, R with this metric is a metric space — one 
that is quite different from R with the usual metric. 

Which sequences converge in the metric space of the previous exercise? 

Let @ and b be points of R? and let X be the set of all smooth parameterized 
curves joining @ to b in R? , with parameter interval [0,1]. That is, ¥ is the set 
of all continuously differentiable functions  : [0,1] + R?, with (0) = « and 
7(1) = b. Show that if 


(m4 72) = sup{||n@ — ve()I| + € € (0, 1}, 


then 6 is a metric on X. 


174 7. Convergence in Euclidean Space 


15. Show that the metric space of the previous exercise is not complete. 


16. Let S be the surface of a sphere in R®. For x,y € S let 6(x, y) be the length of 
the shortest path on $ joining « to y. Show that this is a metric on S. 


17. Imagine a large building with many rooms. Let X be the set of rooms in this 
building and let 6(«,y) be the length of the shortest path along the hallways 
and stairways of the building that leads from room « to room y. Show that 6 
is a metric on X. 


7.3. Open and Closed Sets 


The open ball B,(«o) and closed ball B,(wo), centered at xo € R*, with radius 
r > 0, are defined by 


B,(a0) = {a € R4: ||e—a0|| <r} and B, (a0) = {a € R¢: || — 20|| <r}. 


Of course, open and closed balls centered at a given point and with a given radius 
may be defined in any metric space ~ one simply uses the metric distance 6(«x, 0) 
in place of the distance || — ae|| defined by the norm in R¢. 

Open intervals and closed intervals on the real line play an important part in 
the caleulus of one variable. Open and closed balls are the direct analogues in R¢ 
of open and closed intervals on the line. However, the geometry of R¢ is much more 
complicated than that of the line. We will need the concepts of open and closed for 
sets that are far more complicated than balls. This leads to the following definition. 


Definition 7.3.1. If U is a subset of R“, we will say that U is open if, for each 
point x € U, there is an open ball centered at 2 which is contained in U. We will 
say that a subset of R¢ is closed if its complement is open. A neighborhood of a 
point x € R¢ is any open set which contains z. 


It might seem obvious that open balls are open sets and closed balls are closed 
sets. However, that is only because we have chosen to call them open balls and 
closed balls. We actually have to prove that they satisfy the conditions of the 
preceding definition. We do this in the next theorem. 


Theorem 7.3.2. In R*, 


(a) the emply set 0 is both open and closed; 


(b) the whole space R¢ is both open and closed; 
(c) each open ball is open; 
(d) each closed ball is closed. 


Proof. The empty set ) is open because it has no points, and so the condition that 
a set be open, stated in Definition 7.3.1, is vacuously satisfied. The set R# is open 
because it contains any open ball centered at any of its points. Thus, 0 and R are 
both open. Since they are complements of one another, they are also both closed. 


7.3. Open and Closed Sets 


Proving Theorem 7.3.2(c) and (a). 


Figure 7 


To prove (c), we suppose B, (zo) is an open ball and y is one of its points. Then 
lly — ol] <r and so, if we set s = r—||y — ao||, then s > 0. Also, if « € B,(y), 


then ||r — yl] <.s and so 
Ile = ol] $ |e = yl + lly - 2ol| < s+ |ly— woll =r, 


which means x € B,(ao) (see Figure 7.3.1). Thus, we have shown that, for each 
y € B,(xo), there is an open ball, B,(y), centered at y, which is contained in B,(o). 
By definition, this means that B,(2o) is open. This completes the proof of (c). 

To prove (d), we consider a closed ball B;.(«9). To prove that it is a closed set, 
we must show its complement is open. Suppose y is a point in its complement. This 
means y € R¢ but y ¢ B,(xo), and so ||y—zo|| > r. This time we set s = ||y—eo||-r 
and we claim that the open ball B,(y) is contained in the complement of B,(o). 
In fact, if x € B(y), then || — yl] < s and so, by the second form of the triangle 


inequality (Theorem 2.1.2(b)) 
[le = oll > |ly — 2ol| = lle — yll > Ily- eoll-s=7, 


which means a is in the complement of B,(«o). Thus, we have proved that each 
point of the complement of B,(«ro) is the center of an open ball contained in the 
complement of B,(2o). This proves that this complement is open, hence, that 
B,(xo) is closed. oO 

The above theorem holds in any metric space and it has the same proof. The 
same thing is true of the next theorem. It tells us that the collection of all open 
subsets of R¢ forms what is called a topology. A topology for a space X is a collection 
of sets which are declared to be the open sets of the space. This collection must 
contain the empty set and the space X and must have the property that it is closed 
under arbitrary unions and finite intersections. A space X with a specified topology 


is called a topological space. 


Theorem 7.3.3. In R*, 
(a) ‘the unton of tn arbitrary cedllection-Gfiopert det9 taopen: 


(b) the intersection of any finite collection of open sets is open; 


176 7. Convergence in Euclidean Space 


(c) the intersection of an arbitrary collection of closed sets is closed; 


(d) the union of any finite collection of closed sets is closed. 


Proof. If V is an arbitrary collection of open sets and U = UV is its union, then « 
is in U if and only if it is in at least one of the sets in V. Suppose « € V with V in 
V. Then, since V is open, there is a ball B(x), centered at, which is contained 
in V. Since V C U, this ball is also contained in U. This proves that U is open 
and completes the proof of (a). 


Now suppose {Vi, Va,..., Vn} is a finite collection of open sets and 
rEUH=V{NVN--NVn. 


Then, since each Vg is open, there exists for each k a radius ry such that B,, (a2) C 
Ve. If r = min{ry,r: rn}, then B,(x) C Vg for every k, which implies that 
B,(a) CU. It follows that U is open. This completes the proof of (b). 

The proofs of the corresponding statements (c) and (d) for closed sets follow 


from those for open sets by taking complements. We leave the details to Exercise 
7.3.5, a 


Remark 7.3.4. An easy consequence of the above theorem is that if U is open 
and Ky is closed and if K’ C U, then the set-theoretic difference U \ Kv is open. On 
the other hand, if U C K,, then Kv \ U is closed (Exercise 7.3.6). 


Example 7.3.5. If 0 <r < R, prove that the annulus 
A={x ER? :r< ||x|| < R} 
is open. 


Solution: The ball Bp(0) is open, the ball B(0) is closed, and A is the 
set-theoretic difference Bp(0) \ B,(0). Thus, by the previous remark, A is open. 


A similar argument shows that an annulus of the form 
{ee R? sr < ||| < R} 
is closed. 
Interior, Closure, and Boundary. If £ is a subset of R¢, then the union of all 
open subsets of E is open, by Theorem 7.3.3. By construction, it is a subset of E 


which contains all open subsets of E. Thus, every subset of R@ contains a largest 
open subset — that is, an open subset which contains all other open subsets. 


Similarly, the intersection of all closed sets containing E is a closed set contain- 
ing E and it is contained in every closed set containing E. Thus, it is the smallest 
closed set containing E. 


Definition 7.3.6. Let E be a subset of R¢. Then: 


(a) the largest open subset of E is called the interior of E and is denoted E°; 


(b) the smallest closed set containing E is called the closure of E and is denoted 


E; 
(c) the set E \ E° is called the boundary of E and is denoted dE. 


7.3. Open and Closed Sets 177 


E Be E 


Figure 7.3.2. The Set E of Example 7.3.8, Its Interior E°, and Closure E. 


Note that these concepts can be defined in exactly the same way in any topo- 
logical space and, in particular, in any metric space. 

Recall that a neighborhood of a point x € R@ is any open set containing x. 
The proof of the following theorem is elementary and is left to the exercises. This 
theorem also holds in any metric space. 


Theorem 7.3.7. Let E be a subset of R® and let x be an element of R*. Then: 


(a) « € E® if and only if there is @ neighborhood of x that is contained in E; 

(b) « € E if and only if every neighborhood of x contains a point of E; 

(c) « € OF ff and only if every neighborhood of x contains points of E. and points 
of the complement of E. 


Example 7.3.8. Find the interior, closure, and boundary for the set. 


E ={(«,y) € R®:||(x,y)|| < 1, y > 0} U{(0,-y) : y € [0, 1}. 


Solution: It is immediate from the previous theorem that 
E® = {(x,y) € R?: |\(x,y)|| <1, y > @}, 
E={(e,y) €R?: |l(x,y)|| <1, y > OFU {(,-y) sy € (8, 1}, 
OE = {(x,y) € R?: |\(w,y)|| = 1, y > 0} U[-1, YU {(@,-y) : y € (0, 1}. 
See Figure 7.3.2 


Sequences. The concepts of open and closed sets are intimately connected to 
the concept. of convergence of a sequence. 


Theorem 7.3.9. A sequence {xn} in R® converges to x € R* if and only if, for 
every neighborhood U of x, there is a number N such that x, € U whenever n > N. 


Proof. If for every neighborhood U of x there is an N such that z, € U whenever 
n > N, then this is true, in particular, for each neighborhood of the form B,(z) 


178 7. Convergence in Euclidean Space 


with € > 0. This means that for each € > 0 there is an N such that || — anl| < € 
whenever n > N. That is, lima, = 2. 

Conversely, if lima, = a and U is any neighborhood of x, we may choose an 
€ > 0such that the ball B,(z) is contained in U. By the definition of limit, for this 
€ there is an N such that ||x —rp|| < ¢ whenever n > N. Then a, € B,(a) CU 
whenever n > N. This completes the proof. a 


Theorem 7.3.10. If A is a subset of R¢, then A is the set of all limits of convergent 
sequences in A. The set A is closed if and only if every covergent sequence in A 
converges to a point of A. 


Proof. If x € A, then each neighborhood of x contains a point of A by Theorem 
7.3.7(b). In particular, each neighborhood of the form By/,(), for n € N, contains 
a point of A. We choose one and call it xn. Since ||e —xn|| < 1/n, the sequence 
{xn} converges to x. Thus, each point in the closure of A is the limit of a sequence 
in A, 


Conversely, suppose x = lim.r,, for some sequence {x} in A. By the previous 
theorem, each neighborhood of « contains points in this sequence. In particular, 
each neighborhood of « contains a point of A. Hence, « € A by Theorem 7.3.7(b). 

Since a set is closed if and only if it is its own closure, it follows that A is closed 
if and only if it contains all limits of convergent sequences in A. a 


——————— Ss 
Exercise Set 7.3 


1. Prove that the set {(«,y) € R? : y > 0} is an open subset of R?. 
2. Prove that every finite subset of R® is closed. 


3. Find the interior, closure, and boundary for the set 
{(a,y) €R? :0<2<2,0<y< 1}. 
4. Find the interior, closure, and boundary for the set 
{(e,y) € R?: |\(@,y)|| < 1} U{(e,y) ER’: 
. Prove (c) and (d) of Theorem 7.3.3 


6. Let A be an open set and B a closed set. If B C A, prove that A\ B is open. 
If ACB, prove that B \ A is closed. 


7. Prove Theorem 7.3.7. 


. If E is a subset of R4, is the interior of the closure of E necessarily the same 
as the interior of E? Justify your answer. 
9. If A and B are subsets of R¢, show that AUB 
statement true for AQ B? Justify your answer. 


10. If A and B are subsets of R¢, prove that (AN B)° 
statement true for AU B? Justify your answer. 


0, -2<a <2}. 


a 


AUB. Is the analogous 


Aen BP. Is the analogous 


11. Let {a,} be a convergent sequence in R@ with limit x. Set 


A= {x1,r2,03,...}U {x}; 


7.4. Compact Sets 179 


that is, A is the set consisting of all the points occurring in the sequence together 
with the limit «. Show that A is a closed set. 


12. Let {xn} be any sequence in R¢ and let A be the set consisting of the points 
that occur in this sequence. Prove that the closure of A consists of A together 
with all limits of convergent subsequences of A. 


13. Show that Theorem 7.3.10 remains true if R is replaced by any metrie space. 
14. Find the interior and closure of the set Q of rationals in R. 
15. If Eis a subset of R¢, show that (E)* = (E°)°. 


7.4. Compact Sets 


In this section and the next, we study two topological properties, compactness and 
connectedness, that a subset of R® may or may not have. A topological property 
of a set E is one that can be described using only knowledge of the open sets of R# 
and their relationship to E. Thus, they are properties that can be defined in any 
toplological space. Compactness and connectedness are two such properties. 


Open Covers. An open cover of a set E C R¢ is a collection of open sets whose 

“anion colitaifis B wAw'Gpen cover obs sat by may or whayniot havea nite suboover, 

thatsia (there-thay.or mayshol: betAinitely-many setein: the ‘collection’ which ‘also 
form a cover of B. 


Example 7.4.1. The collection U of all open intervals of length 1/2 and with 
rational endpoints is clearly an open cover of the interval (0, 1]. Show that it has a 
finite subcover. 

Solution: The three intervals (-1/8,3/8), (1/4,3/4), and (5/8, 9/8) belong to 
U and they cover [0, 1). 


Example 7.4.2. The collection {(1/n,1):n = 1,2,. 
which covers (@, 1). Does it have a finite subcover? 


} is a collection of open sets 


Solution: No. Since this collection of intervals is nested upward, any finite 
subcollection has a largest interval (1/m,1). Then the union of the sets in the 
subcollection is just (1/m, 1) and this does not contain (0, 1). 


Compactness. The above discussion leads to the following definition: 


Definition 7.4.3. A subset A’ of R® is called compact if every open cover of K 
has a finite subcover. 


Note that Example 7.4.2 shows that the open interval (0,1) is not compact, 
since it has an open cover with no finite subcover. 

A subset E of R¢ is bounded if there is a number R such that |le|| < R for 
every x € E ~ that is, if E C Bp(0) for some R. 


Theorem 7.4.4. Every compact subset N of R& is bounded. 


180 7. Convergence in Euclidean Space 


Proof. We have K C R4 =U, B,(0). This means that the open balls B, (0) for 
n= 1,2,... form an open cover of K. Since K’ is compact, finitely many of these 
balls must also form a cover of A’. This implies that K’ is contained in one of these 
balls, say By(0), since they form a sequence which is nested upward. Since K’ is 
contained in Bin(0) C Bin(0), it is bounded. a 


Theorem 7.4.5. Every compact subset K of R% is closed. 


Proof. We will prove this by showing that K = K. If « € K and n is a positive 
integer, we let Up be the complement in R4 of By/n(«). The union of the nested 
sequence of open sets {U;,} is R4 \ {x}. 

If some finite subcollection of {Un} covers K’, then some one of these sets, say 
Um, contains K. This means that B1jm(c)K = 0, which is impossible, since 
«x € K. Because K is compact, this means that {U;,} cannot be an open cover of 
K. Since « is the only point of R? not covered by {U,,}, « must be in K. 


We conclude that K = K and K is closed. El 


The Heine-Borel Theorem. The last two theorems show that a compact subset 
of R¢ is both closed and bounded. The Heine-Borel Theorem says the the converse 
is also true — every closed bounded subset of R? is compact. Before we prove this, 
we prove the following analogue of the Nested Interval Theorem (Theorem 2.5.1). 


Theorem 7.4.6. If Ay > Az D ++» D An D Anyi D <*> is a nested sequence of 
non-emply bounded closed subsets of R*, then Q\y An # 9- 


Proof. Since each A, is non-empty, we may choose a point x, € Ay for each 
n. These points are all in Ay, which is bounded. Hence, {x,} is a bounded se- 
quence. By the Bolzano-Weierstrass Theorem (Theorem 7.2.14) this sequence has 
a convergent subsequence {n, }. Let « be the limit of this subsequence. 

Since Aj is closed and », € Ay for every k, we have that # € Aj. In fact, for 
each n, ne > nif k > n, and so, beginning with the nth term, each term of the 
sequence {am,} belongs to An. Since Ap is closed, we have x € Aj. We conclude 


that 2 € (),, An. Hence, (),, An # 0. oO 


In the proof of the following theorem, we will make use of the concept of a 
d-cube in R*. This is a set of the form C = Ty x Ip X «+» x Ig, where each I; is a 
closed bounded interval in R of length L. The intervals Jj are called the edges of 
C and the number L is called the edge length of C. Note that a 2-cube is just a 
square in R? with sides parallel to the coordinate axes, while a 3-cube is a cube in 
R® with edges parallel to the axes. 


Theorem 7.4.7 (Heine-Borel Theorem). A subset of R? is compact if and only 
if it is closed and bounded. 


Proof. We already know that every compact subset of R¢ is closed and bounded. 
Thus, to complete the proof, we just need to show that every closed bounded subset 
of R¢ is compact. 

Let K’ be a closed bounded subset of R@ and let V be an open cover of K. 
Suppose V has no finite subcover. We will show that this leads to a contradiction. 


7.4. Compact Sets 181 


Figure 7.4.1. Nested Cules of Theorem 7.4.7. 


Since K is bounded, it lies inside some d-cube Cy. Let L be the edge length 
of Cy. By partitioning each edge of C; at its midpoint, we may partition C; into 
2# d-cubes of edge length L/2. By intersecting each of these smaller cubes with Ky, 
we partition K into finitely many subsets. If each of these is covered by finitely 
many of the sets in V, then K’ itself is also. Since it is not, we conclude that the 
intersection of A’ with at least one of these smaller d-cubes is not covered by finitely 
many sets in V. Choose one and call it C2. 


By continuing in this way (actually, by induction), we may construct a nested 
sequence of d-cubes (see Figure 7.4.1) 


C13 C23-+-DCnDCny1 Ds, 


where, for each n, Cy is a closed d-cube of edge length L/2"~! and with the property 
that C, 9 K cannot be covered by finitely many of the sets in V. 


The sets C,, K form a sequence of closed, hounded sets, nested downward, as 
in the previous theorem. By that theorem (),,(Cn AK) is not empty. Let x be a 
point in this intersection. Then « € K’ and, since V is an open cover of K, there is 
some open set V in the collection V such that « € V. Since V is open, there is an 
open ball B,(x), centered at x, which is contained in V. 

The diameter of C,, (maximum distance between two points of C,,) is less than 
dL/2"-'. Hence, for large enough n, the diameter of C,, is less than r. Then Cp 
must be contained in B,(x) since it contains x. This implies that C, C V. This 
is a contradiction, since C, was chosen so that no finite subcollection of the sets 
in V covers C, 0 K. Thus, our assumption that A’ is not covered by any finite 
subcollection of V has led to a contradiction. 


We conclude that every open cover of A has a finite subcover and, hence, that 
K is compact. a 


Corollary 7.4.8. Each closed subset of a compact set in R¢ is also compact. 


182 7. Convergence in Euclidean Space 


Proof. If A is closed and contained in a compact set A’, then A is bounded because 
K is bounded. Since A is closed and bounded, it is compact by the Heine-Borel 
Theorem. Oo 


Applications of Compactness. The next chapter will contain a large number 
of applications of compactness to function theory. The next example illustrates a 
technique that is often used in such applications. 


Example 7.4.9. Let K be a compact subset of R¢ and let p be a function defined 
on K with p(x) > @ for each « € K. Prove there exists a finite set of points 
{ar1,@2,...,m} such that K is contained in the union of the open balls Bye, (xi) 
for i= 1,2,...,m. 

Solution: The collection of open sets {Bp(,)(z) : 2 € K’} is an open cover of K 
(since, for each y € K’, y € Bey) (y) C UL By(a)(@) : @ € K}). Since K’ is compact, 
there is a finite subcover {By(,,)(i) i= 1,...,m}. This means K is contained in 
the union of the Bpia,)(«i) for i= 1,2,...,m. 


The next theorem is an application of this technique. It is a separation theorem 
which shows that a compact set, is separated from the complement of any open set 
that contains it. 


Theorem 7.4.10. Suppose K is a compact subset and U is an open subset of 
R¢ with K CU. Then there exists an open set V such that V is compact and 
KE WVCViGUs 


Proof. Since U is open and contains A’, for each « € K there is an open ball 
centered at 2 which lies in U. Then the ball, centered at x, of half this radius 
has its closure contained in U. Let p(x) be the radius of this smaller ball. Then 
« € Byay(«) C Byay(«) C U. By the previous example, there are finitely many 
points #;,...,2m such that K is contained in the union V of the sets By(»,)(wi). 
The closure of V is contained in the compact set which is the union of the sets 
Boca.) (i), and this is contained in U. Thus, V is compact, since it is a closed 
subset of a compact set, and K CV CV CU. Qo 


Compact Metric Spaces. Since compactness is a topological property, it makes 
perfectly good sense in any metric space. The definition of a compact subset of a 
metric space X is exactly the same as Definition 7.4.3 except that R@ is replaced 
by X. If the space X itself is compact, then X is called a compact metric space. 

Any compact subset of R¢ is a compact metric space if it is considered a space 
by itself and is given the same metric it has as a subset of R?. 


1. If K is a compact subset of R¢ and Uy C Up C +++ C Up C +++ is a nested 
upward sequence of open sets with A’ C U,, Ux, then prove that Kt is contained 
in one of the sets Ux. 


7.4. Compact Sets 183 


a 


11. 


12. 


13. 


14. 


. Prove that if Ai, A is a dis} 


. Prove that if A and B a 


. Let A’ be a compact subset of R and let Ay D Ay D +++ D Aj D+: bea 


nested downward sequence of closed subsets of R“. Show that if A, NK 40 
for each k, then (Q), Ax) OK £0. 


. Show that if Ay D Ay D-:- D Aj D-::: is a nested downward sequence of 


compact sets and U is an open set which contains (), A’j, then U contains one 
of the sets Ky. 


. Prove that if A’ is a compact subset of R%, then A’ contains a point of maximal 


norm. That is, there is a point x, € A’ such that 


Hint: Set m = sup{|ler|: 


ar] 


<leil| forall wen. 


x € K} and consider the open balls Bm 1/n(0). 


. Prove that if A’ is a compact subset of R¢ and y is a point of R® which is not 


in A, then there is a closest point to y in A’. That is, there is an zp € A such 
that 


||zo —y|| < le yl] forall wen. 


. Prove that the conclusion of the previous exercise also holds if we only assume 


that Kv is a closed subset of R“. Hint: Replace Av by its intersection with a 
suitably large closed ball centered at y. 


int pair of compact sets, then there exis! 
disjoint pair of open sets Vj,V2 such that Ay C Vj and A’) C V9. Hint: Use 
Theorem 7.4.10. 


. Prove that a set A C R¢ is compact if and only if every sequence in A’ has 


a subsequence which converges to an element of A’. Hint: Use the Bolzano- 
Weierstrass and Heine-Borel Theorems. 


. Show that it is true that the union of any finite collection of compact subsets 


of R¢ is compact, but it is not true that the union of an infinite collection of 
compact subsets is necessarily compact. Show the latter statement by finding 
an example of an infinite union of compact sets which is not compact. 


re compact subsets of a metric space, then AU B and 


AN B are also compact. 
Prove that if X is a compact metric space, then every sequence in Y has a 
convergent subsequence. 

Prove that if X is a compact metric space, then every closed subset of X is also 
compact. 

Prove that a compact metric space is complete (that is, every Cauchy sequence 
converges). 

We will say a metric space X is bounded if, for some M > @ and x € X, the 
entire space X is contained in Bas(«) = {y € X : 6(«,y) < M}. Show that a 
compact metric space is bounded. 


. Consider the metric space of Exercise 7.2.12. Show that it is complete and 


bounded but not compact. Thus, the analogue of the Heine-Borel Theorem 
does not hold in general metric spaces. 


184 7. Convergence in Euclidean Space 


A B Cc 


Figure 7.5.1. Disconnected and Connected Sets. 


7.5. Connected Sets 


Consider the three sets A, B, C described in Figure 7.5.1. Each of these sets is the 
union of two closed dises of radius 1 in R?. In A the distance between the centers of 
the two discs is greater than 2; in B it is less than 2; and in C it is exactly 2. The 
point about these three sets that we wish to discuss is this: set A is disconnected 
~ one cannot pass from one of the discs making up this set to the other without 
leaving the set. On the other hand, B and C are connected — one can pass from 


any point in the set to any other point in the set without leaving the set. As stated 


so far, these are not very precise ideas. The precise definition of connectedness is 
as follows. 


Definition 7.5.1. A subset E of R¢ is said to be separated by a pair of open sets 
U and V in R¢ if 

(a) ECUUV; 

(b) (ENU)N (ENV) =0; 

(c) ENU £0, and ENV 40. 


If no pair of open subsets of R¢ separates E, then we will say that E is connected. 


The above definition becomes somewhat simpler to state if we give a special 
name to subsets of E of the form EU where U is an open set. 


Definition 7.5.2. Let E be a subset of R“. A subset A of F is said to be relatively 
open (in E) if it has the form A = ENU for some open subset U of R®. Similarly, 
a subset B is said to be relatively closed (in E) if it has the form EA C for some 
closed subset C' of R¢. 


Using these concepts, the definition of connectedness can be rephrased as fol- 
lows. 


Remark 7.5.3. A subset E of R¢ is connected if and only if it is not the disjoint 
union of two non-empty relatively open subsets. 


7.5. Connected Sets 185 


Connected Subsets of R. The connected subsets of R are easily characterized. 


Theorem 7.5.4. A non-empty subset of R is connected if and only if it is an 
interval. 


Proof. Suppose E is a non-empty subset of R. Let 
a=infE and b=supE. 


Now a and b may not be finite, but E is certainly contained in the interval consisting 
of (a,b) together with {a} if a is finite and {b} if b is finite. The set E will be an 
interval if and only if it contains (a, b). 

Suppose E is not an interval. Then there is an x € (a,b) such that « ¢ E. 
Then F is contained in the set (—oo, z) U (2,00). Furthermore, since a = inf E and 
a <x, there must be points of E which are less than « — that is, EN (—co,x) 4 0. 
Similarly, since 6 = sup E and x < b, EN («,c0) 4 0. Thus, by Definition 7.5.1, 
the set, E is separated by the pair of open sets (—co,) and (a,00) and, hence, is 
not connected. Thus, if E is connected, it must be an interval. 

Conversely, suppose E is an interval. Then E is (a,b) possibly together with 
one or more of its endpoints. Suppose U and V are open subsets of R with 
(UN BE) A (VN E) =O and E CUUV. We define a function f on E by f(x) =@ 
ifaé€ EQU and f(x) =1if*e ENV. 

We claim f is a continuous function on the interval FB. If « € E and « > @, 
then « is in one of the sets U or V. Since they are both open, there is an interval 
(a — 6, +6) which is also contained in whichever of these sets contains x. Then f 
has the same value at any y € EO (« —6,2+6) that it has at «. Thus, 


\f(z) — f(y)|}=0<e whenever ye E and |x—y| <6. 


This proves that f is continuous on E. However, its only possible values are @ and 1. 
By the Intermediate Value Theorem (Theorem 3.2.3) it cannot take on both these 
values, since it would then have to take on every value in-between. This means one 
of the sets EAU, ENV is empty. Hence, E is not separated by U and V. We 
conclude that no pair of open sets separates E and, hence, E is connected. 0 


If L is a straight line in R¢, then the intersection of an open ball in R¢ with L 
is an open interval in L (or is empty). It follows that the relatively open subsets 
of L are exactly the open subsets of L considered as a copy of R. It follows from 
the above theorem that intervals in L are connected subsets of R¢. Thus, the line 
segment joining two points in R® is a connected set. 


Connected Components. 


Theorem 7.5.5. If A and B are connected subsets of R? and AN B #0, then 
AUB is also connected. 


Proof. Suppose U and V are disjoint relatively open subsets of AU B such that 
AUB=UUV. Then UN A and V0 A are disjoint relatively open subsets of A. 
Since A is connected, U and V cannot both have a non-empty intersection with 
A, Since A is contained in their union and can’t meet both of them, A must. be 
contained in either U or V. Similarly, B must be contained in either U or V. Since 


186 7. Convergence in Euclidean Space 


Figure 7.5.2. A Piecewise Linear Path in EF. 


U and V are disjoint and A and B are not, A and B must be contained in the same 
one of the sets U, V and must both be disjoint from the other. Since UUV = AUB, 
one of the sets U, V is empty. This shows that U and V do not separate AU B. 
Hence, AU B is connected. a 


Basically the same argument shows that the union of any collection of connected 
sets with at least one point in common is also connected (Exercise 7.5.6). In 
particular, if « € E where F is some subset of R¢, then the union of all connected 
subsets of E containing 2 is itself connected. Thus, for each point x € E there is a 
connected subset of E which contains all connected subsets containing « — that is, 
a maximal connected subset containing <. 


Definition 7.5.6. If £ is a subset of R¢ and « € E, then the union of all connected 
subsets of E containing x is called the connected component of E containing x. 


Clearly, the connected components of E are the maximal connected subsets of 
BE. Any two distinct components are disjoint since, otherwise, their union would be 
a connected set larger than at least one of them. Two points « and y of E are in 
the same component of E if and only if there is some connected subset of E that 
contains both x and y. In particular, if the line segment joining two points x and 
y of E also lies in E, then « and y are in the same connected component of E. 

Since every point in an open or closed ball is joined by a line segment to the 
center of the ball, we have 


Theorem 7.5.7. Every open or closed ball in R¢ is a connected set. 


More generally, a piecewise linear path joining x and y in E is a finite set of 
line segments {[2;_1, i]}”,, each contained in F, with each line segment beginning 
where the preceding one ends and with 9 = and tm = y. One easily proves by 
induction that the union of the line segments in such a path is a connected set (see 


Figure 7.5.2). It follows that 


Theorem 7.5.8. If E is a subset of R¢ and x and y are points of E. that may be 
joined by a piecewise linear path in E, then x and y are in the same component of 


7.5. Connected Sets 187 


Figure 7.5.3. A Sct with Infinitely Many Components. 


E. If every pair of points in E can be joined by a piecewise linear path in E, then 
E is connected. 


Example 7.5.9. Find a subset of R? with infinitely many components. 


Solution: This is easy. The set of integers on the a-axis is such a set. Since 
the only connected subsets of this set are the single point subsets, each point is a 
component. A more complicated example is illustrated in Figure 7.5.3. The vertical 
lines that touch the bottom horizontal line together with this horizontal line form 
one component, while each of the shorter vertical lines is itself a component. 


Components of an Open Set. 


Theorem 7.5.10. If U is an open subset of R4, then each of ils connected compo- 
nents is also open. 


Proof. Let V be a connected component of the open set U and let x be a point of 
V. Since U is open, there is an open ball B,(z), centered at x, such that B,(x) CU. 
Since V is the union of all connected subsets of U containing « and since B,(x) is 
connected, it must be true that B,(x) C V. Since every point of V is the center of 
an open ball contained in V, the set V is open. a 


int family of open 


The components of an open set U form a pairwise disj 
connected subsets of U with union U. Conversely: 


Theorem 7.5.11. Jf an open set U can be written as the wnion of a pairwise disjoint 
family V of open connected subsets, then these subsets must be the components of 
U. 


Proof. If V is one of the open sets in V, then V must have non-empty intersection 
with at least one component of U. Call it C. Then V C C since V is a connected 
set containing a point of the component C. 

We must also have CC V, since, otherwise, V and the union of all the sets in 
V other than V would be two open sets which separate C. Thus, V = C. 

We now have that every set in V is a component of U. Since the union of the 
sets in Vis U, every component of U must occur in V. This completes the proof. O 


188 7. Convergence in Euclidean Space 


Example 7.5.12. What are the components of the complement of the set DU E 
where 


D={(a,y) € R?:|\(e@+1,y)||=1} and B= {xe R?: ||(e—1,y)|| = 1}. 


Solution: The complement of DU E is the union of the open sets 
A= {(a,y) € R?:||(e + 1,y)|| < 1}, 
(7.5.1) B={(x,y) €R?:||(e—1,y)||< 1, and 
C ={(x,y) € R® :|\(« + 1,y)|| > 1 and ||(@ — 1,y)|| > 1}. 


These three sets are pairwise disjoint and each of them is connected. Hence, they 
must be the components of the complement of DU E, by the previous theorem. 


In the first four exercises below, tell whether or not the set A is connected. If A is 
dot counedted desctilie iteseonnectéd com ponents: Justify your answers 
1A (C9) €R?: lon < Una) ERAS # $2, y= 0} 

WER? |[(x,y)|| < 1}U{(e,y) € R21 <2 $2, y= 0}. 

@,y) €R?:1< ||(x,y)|| < 2}. 

2,9) ER? 1 < |l(ey) |] < FU Ley) ER? = [|(e,9)|] <1}. 

5.’ What-are the connected. com ponenté of the ‘complement of the’set of integers 
in R? 

6. Prove that the union of a collection of connected subsets of R¢ with a point in 
common is also connected. 


7. Which subsets of R are both compact and connected? Justify your answer. 


8. Give an example of two connected subsets of R? whose intersection is not con- 
nected. 


9. Prove that if E is an open connected subset of R“, then each pair of points in 
E can be connected by a piecewise linear path in E. Hint: Fix a point «o € E 
and consider two sets: (1) the set U of all points in F that can be connected to 
ao by a piecewise linear path in E and (2) the set V of points in E that cannot 
be connected to ao by a piecewise connected path in E. 


10. Prove that the closure of a connected set is connected. 
11. Is the interior of a connected set necessarily connected? Justify your answer. 
12. Are the components of a closed set necessarily closed? Justify your answer. 


13. Connected sets in a metric space (or any topological space) are defined in the 
same way as they are in R¢. Is it true in general for metric spaces that open 
balls are connected? 


7. 


5. Connected Sets 189 


14. A subset of a metric space is said to be totally disconnected if its components 
are all single points. Find a compact, totally disconnected subset of R which is 
not a finite set. 

15. Find a compact, totally disconnected subset of R (see the previous exercise) 
which has no isolated points (a point x € E is an isolated point of F if {a} is 
relatively open in FE ~ that is, if there is an open set U such that UN E = {c}). 


—aessy 
Chapter 8 


Functions on Euclidean Space 


In this chapter we begin the study of functions defined on a subset of the Euclidean 
space R? with values in the Euclidean space R?. Our first objective is to define and 
study continuity for such functions. 


8.1. Continuous Functions of Several Variables 


For two natural numbers p and q, we shall study functions F, defined on a subset D 
of R? and with values in R%. Such a function is sometimes called a transformation 
from D to R?. We will denote this situation by F : D + R?. The definition of 
continuity in this context follows the familiar pattern. 


Definition 8.1.1. Let D be a subset of R? and let F : D — R? be a function. We 
say that F is continuous at a € D if for each € > 0 there is a 6 > 0 such that 


IFC 


F(a)||<¢ whenever z€D and |fe—al| <6. 
If F is continuous at each point of D, then F is said to be continuous on D. 


Note that this definition depends very much on the domain D of the function 
due to the fact that the condition on ||F(«) — F(a)|| is only required to hold for 
ax € D. If the domain of the function is changed, then what it means for a function 
to be continuous at a may change even if a is in both domains. 


Example 8.1.2. The function f : R? + R which is 1 on By(0) and 0 everywhere 
else is clearly not continuous at boundary points of B, (0). Show that, if the domain 
of f is changed to B;(0), then the new function is continuous on all of By(0). 

Solution: The new function is just the function which is identically 1 on its 
domain and, hence, is continuous at each point of its domain ~ including points of 
the boundary. 


191 


192 8. Functions on Euclidean Space 


Example 8.1.3. Consider the function f : R? + R defined by 


fea dare fF (0.0), 
: 0 if (x,y) = (0,0). 


Show that f is not continuous at (0,0). 
Solution: This function has the value 0 at (0,0), but every disc centered at 
(0,0) contains points of the form («,«) with « # 0 and, at such a point, f has the 
value 1/2. So the condition for continuity at (0,0) will not be satisfied when € is 
1/2 or less. 
Example 8.1.4. Show that the function with domain R? defined by 
xy ; 
ssf CG) #9), 
f(,y) = ety 
0 if (x,y) = (0,0) 
is continuous at (0,0). 
Solution: Since (x+y)? > 0 and («—y)* > 0, it follows that —2ry < a? +y* 
and 2cy < «* + y?. Taken together, these two inequalities imply that 
ley] <a? +y?. 
On dividing by 2\/z? +47, this becomes 
wy 


If(@.y) — £(0,0)| = 


IA 


5 VR = 5 i(e.y)— O.0)Ik 


Varrty 
Thus, given € > 0, if 6 = 2e, then 
\f(2,y) — £(0,0)|<e whenever  ||(x,y) — (0,0)|| <6. 


We conclude that f is continuous at (0,0). 


Vector-valued Functions. The previous two examples involved real-valued func- 
tions. We will also be concerned with functions with values in R? for some natural 
number ¢ > 1. Given such a function F with domain D C R?, for each x € D let. 
fj(x) = ej - F(x) be the jth component of the vector F(x) € R?. Then each f; is 
a real-valued function on D. We will sometimes denote the function F by 


F(x) = (fila), fale), -- +5 fa())- 


The real-valued function f; is called the jth component function of F. 


Theorem 8.1.5. A function F : D + R4 is continuous at a point a € D if and 
only if each of its component functions is continuous al a. 


Proof. It follows from Theorem 7.1.13 that, for each k and each 2 € D, 
q 
Ifx(a) = fe(a)| < ||F() = F(@)|| < SOLA @) - fi@)|- 
j=l 
Given € > 0, it follows from the first inequality that if ||F(«) — F(a)|| < ¢, then 


also |fx(a) — fx(a)| < € for each k. Hence, if F is continuous at 9, then so is each 
fx. It follows from the second inequality that if |f;(«) — f;(a)| < €/q for each j, 


8.1. Continuous Functions of Several Variables 193 


then ||F(«) — F(a)|| <¢. This implies that if each f; is continuous at a, then so 
is F. oO 


Sequences and Continuity. Recall that Theorem 3.1.6 says that a function f 
of one variable is continuous at a point a of its domain D if and only if it takes 
sequences in D which converge to a to sequences which converge to f(a). The same 
theorem is true of functions of several variables. In fact, it is true of any function 
from one metric space to another. The proof is also the same and we won’t repeat 
it. 


Theorem 8.1.6. Let D be a subset of R?, let a € D, and let F : D > R4 be 
a transformation. Then F is continuous at a if and only if, whenever {an} is a 
sequence in D which converges to a, then the sequence {F(xn)} converges to F(a). 


If F and G are two functions with domain D C R? and with values in R@ and if 
his a real-valued function with domain D, then we can define new functions, hF, 
F +G, and F-G by 

(AF) (2) = h(x) F(a), 
(8.1.1) (F +G)(«) = F(x) + G(x), 
(F-G)(«) = F(x) G(2). 


Theorems 7.2.12 and 8.1.6 combine to prove the following theorem. The details 
are left to the exercises. 


Theorem 8.1.7. With F, G, h, and D as above, if F, G, and h are continuous 
ata € D, then so are hF, F+G, and FG. 


Composition of Functions. If G: D + R? is a function with domain D CR? 
and F : E - R@ is a function with domain E C R?, then F(G(a)) is defined as 
long as « € D and G(x) € E. Thus, 


(Fo G)(x) = F(G(z)) 


defines a function with domain DA G~1(E) and with values in R?. This is the 
composition of the function G with the function F. 

The following theorem follows immediately from two applications of Theorem 
8.1.6. The details are left to the exercises. 


Theorem 8.1.8. With F and G as above, ifa€ DAG-'(E), G is continuous at 
a, and F is continuous at Ga), then FoG is continuous at a. 


Limits. Whether or not a function F is defined at a point a € RP, it may have 
a limit as approaches a. In order for this concept to make sense, it must be the 
case that there are points of the domain of F which are arbitrarily close but not 
equal to a. 

If D is a subset of R? and a € R?, then we will say that a is a limit point of D 
if every neighborhood of a contains points of D different from a (note that a may 
or may not be in D). 


194 8. Functions on Euclidean Space 


Definition 8.1.9. If D C R?, a isa limit point of D, and F : D — Ris a function 
with domain D, then we will say that the limit of F as x approaches a is b if, for 
each € > @, there is a 6 > @ such that 


F(a) -b||<« whenever c€D and @<|[x-all <6. 


In this case, we write limyya F(x) = b. 


If we compare this definition with the definition of continuity at a (Definition 
8.1.1), we see that a function F : D + R is continuous at a point a € D which is 
a limit point of D if and only if lim, ,, F(«) = F(a). 

On the other hand, if a € D but @ is not a limit point of D, then a function F 
with domain D is automatically continuous at a (since, for small enough 4, there 
are no points « € D with ||a — al] <6 other than « = a), but the limit of F as x 
approaches a is not defined. A point of D which is not a limit point of D is called 
an isolated point of D. For example, the set D = B,((@,@)) U {(1, 1)} is a subset of 
R? with (1,1) as an isolated point. 

Note that Examples 8.1.3 and 8.1.4 show that 

xy 


hi a 
(x,y) 0.0) fx? + xe 


while 
lim trae 

(x,y)+(0,0) @? + y? 

does not exist. In fact, this function has limit 
a 
T+a? 

@) along the line y = az. Since the function approaches 
y) approaches (@,@) from different directions, the limit 


as («,y) approaches 
different numbers as ( 
does not exist. 


Curves and Surfaces. A continuous function y: J — R¢@, where J is an interval 
in R, is called a parameterized curve with parameter interval J. The variable ¢ in 
y(t) is called the parameter for the curve. Intuitively, as t ranges through the 
parameter interval, y(¢) traces out something like a curved line in R?. 

If the parameter interval J is a closed bounded interval [a, b] with y(a) = 2 and 
7(b) = y, then ¥ is called a curve in R? joining x to y. The points x and y are 
called the endpoints of the curve. If x = y, then ¥ is called a closed curve. 


Example 8.1.10. Give examples of a closed curve, a curve with endpoints which 
is not closed, and a curve with no endpoints. 


Solution: The curve 7(t) = (cost, sin t), t € [@, 27], is a closed curve in R?. It 
is closed because 7(@) = (1,0) = 7(27). 

The curve 7(t) = (t?,t9), ¢ € [0, 1], is a curve joining x = (@,@) and y = (1,1). 
It has these points as endpoints. It is not closed, since the endpoints are not the 
same. 

The curve 7(t) = (tcost,tsin t,t), t € (—0o, 00), is a spiral curve in R® with no 
endpoints. 


8.1. Continuous Functions of Several Variables 195 


Generally, a curve is a one-dimensional object, but there are exceptions. A 
curve may be degenerate — that is, y(t) may be a constant vector in R?. Then the 


image of 7 is a single point, which is a zero-dimensional object. 


A parameterized surface in R% (¢ > 2) is a continuous function F : A > R?, 
where A is an open subset of R? or an open subset of R? together with all or part 
of the boundary of this open subset. 


Example 8.1.11. Give three examples of parameterized surfaces. 
Solution: The image of the surface 
F(0,) = (cos@cos@,sin@cos¢,sind) with @ € [0,27), ¢ € [0,7] 
is the sphere of radius 1 centered at the origin. The parameter set A in this case is 
the rectangle (0,27) x [0,7]. The parameterization is the one given by expressing 
the sphere in spherical coordinates. Note that this sphere is just B)(0)\ B,(0) and, 
hence, is a closed set. (Exercise 7.3.6) even though its parameter set is not closed. 


The closed upper half of the above sphere may be parameterized as above but 


with parameter set [0, 27) x [0, 7/2] or it may be parameterized by 


G(z,y) = (a,y, V1 —y?) with 2+y?<1. 
Here, the set A is the closed disc of radius 1 centered at the origin in R?. 


If we change the parameter set for G in the above example to the open disc 
of radius 1 centered at 0, then we obtain a surface which is not a closed set — the 
upper half of the unit sphere not including the circle {(«,y, 2) : «2-+y? = 1, z = 0}. 


Generally, the image of a parameterized surface is a two-dimensional object, 
but there are exceptions. A surface may be degenerate. The parameter function F 
could have image contained in a set of dimension less than 2 ~ it could be a point 
or a curve. For example, the image of 


F(u,v) = (cos(u+v),sin(ut+v),u+v) with (u,v) €R? 
is actually the spiral curve (cost, sin ¢,#), as we can see by making the substitution 
t=ut+v. 
Conditions that guarantee that a curve or surface is not degenerate will be 
obtained in the next chapter. 


1. Consider the function f : R? + R defined by 
xy? 


Haylee # @MAOO: 
0 if (x,y) =(0,0). 
Is this function continuous at (0,0)? Justify your answer. 
2. Give a simple reason why the function 7: R + R? defined by 
(t) = (t,sint, eft) 


is continuous on R. 


196 


8. Functions on Euclidean Space 


. Does the function f : R? \ {(@, 


)} +R, defined by 
f(y) = Nees 


Pty 


have a limit as («, y) approaches 


,0)? Justify your answer. 


. Consider the function f : R? + R defined by 


_ fay if cy>, 
seod= 1 if cy <0. 


At which points of R? is this function continuous? 


. For the function f : R? + R defined by 


a 


few) = aye 


show that f has limit @ as (x,y) + (@,@) along any straight line through the 
origin but that it does not have a limit as (x,y) + (0,@) in R2. 


. Consider the function f : R? + R defined by 


ge 


y pee a 
fey) = pom TURE 


e if y=a?. 


At which points of R? is this function continuous? 


7. Prove Theorem 8.1.7. 
8. Prove Theorem 8.1.8. 


9. Prove that a is a limit point of a set D C R? if and only if there is a sequence 


11. 


13. 


14. 


of points in D but not equal to a which converges to a. 


. Let D be a subset of R? and let F': D — R1 be a function. If a is a limit point 


of D, prove that lim, , F(x) = 6 if and only if limp yoo F(an) = 6 whenever 
{ap} is a sequence in D which converges to a. 
Let F : D +R! be a transformation with domain D C R? and let a be a limit 
point of D. Prove that if {F(an)} converges whenever {2p} is a sequence in D 
which converges to a, then limz ya F(x) exists. 


. Let By (®) be the open unit ball in R2. Is it true that every continuous function 


f : By(®) + R takes Cauchy sequences to Cauchy sequences? 

Let B1(@) be the closed unit ball in R2. Is it true that every continuous function 
f: By(@) + R takes Cauchy sequences to Cauchy sequences? 

Find a parameterized curve 7(t) in R?, with parameter interval [0,00), that 
begins at (1, @), spirals inward in the counterclockwise direction, and approaches 
(0,0) ast + cw. 


. Find a parameterization of the cylindrical surface in R® defined by the equation 


ety 
with A C R®, such that F has the cylinder as image. 


= 1 (z is unrestricted). That is, find a continuous function F : A > R* 


8.2. Properties of Continuous Functions 197 


8.2. Properties of Continuous Functions 


The theme of this section is that continuous functions are the functions that behave 
well with respect to topological properties of sets. 


Continuity and Open and Closed Sets. Recall that if D is a subset of R?, then 
a relatively open subset of D is a set of the form UA D, where U is open in R?. The 
relatively open subsets of D are the open subsets of D considered as a metric space 
by itself (rather than a subset of R?). Relatively closed sets are defined analogously. 


Theorem 8.2.1. [f D C R? and if F: D> R9 is a function, then F is continuous 
on D if and only if F-1(U) is a relatively open subset of D whenever U is an open 
subset of R?. Equivalently, F is continuous if and only if F~1(A) is a relatively 
closed subset of D whenever A is a closed subset of R%. 


Proof. Suppose F is continuous and U is an open subset of R%. If a € F-'(U), 
then b = F(a) € U. Since U is open, there is an ¢ > @ such that B,(b) CU. Since 
F is continuous on D, there is a 5 > @ such that 


\|F(x) — F(a)||<e€ whenever x€D and |\x-al| <6. 
This implies that F(Bs(a) D) C B.(b) C U and, hence, that 
Bs(a)O DCF 'U). 


Since we can do this at each a € F~1(U), we conclude that F~1(U) is the intersec- 
tion of D with the union of the resulting collection of open balls Bs(a). Hence, it 
is relatively open in D. 

On the other hand, suppose F~1(U) is relatively open in D for each open set 
U in R®. In particular, this implies that if a € D, b = F(a), and ¢ > @, then the 
set F~1(B,(b)) is relatively open in D. Thus, 


F-1(B.(b)) =DOAV 


for some open set V C RP. Since a € V and V is open, there is a 6 > @ such that 
Bs(a) CV. Then x € D and || — al] <6 implies « € VA D = F-1(B,(b)). This 
means that 


||F(a) — F(a)|| <€ whenever x€D and |[x—al| <6 
Hence, F is continuous at a. Since this is true for all points a € D, we conclude 
that F is continuous on D. 


The analogous result for closed sets follows from the above by taking comple- 
ments and using the fact that a subset of D is relatively closed if and only if it is 
the complement in D of a set which is relatively open. The details are left to the 
exercises. a 


If D is open, then the relatively open subsets of D are just the open subsets of 
D. Hence, we have the following corollary of the above theorem. 


Corollary 8.2.2. If D C R? is open and F : D — R® is a function, then F is 
continuous on D if and only if F~'(U) is open for every open set U CR’. 


198 8. Functions on Euclidean Space 


Continuity and Compactness. The proof of the following theorem is very sim- 
ple, but it has a lot of very useful consequences. 


Theorem 8.2.3. If h’ is a compact subset of R? and F': Kv --> R@ is a continuous 
function, then F(K) is a compact subset of R4. 


Proof. Let U be an open cover of F(A‘) and let V be the collection of all open 
subsets VC R? such that VA A = F-1(U) for some U € U. There is at least 
one such V for each U € U since F-1(U) is relatively open in A’ by the previous 
theorem. 


Since U is a cover of F(A’), V is an open cover of A’. Since Kis compact, there 
is a finite subcollection {Vj}"1 of V which also covers K’. For each Vj we may 
choose a U; € U such that Vj WK = F-1(U;). 

Ify € F(A), then y = F(z) for some x € K. This x belongs to Vj A for some 
j because {V;}""_, is a cover of K'. Then y € Uj. This proves that the collection 


{Uj} is a cover of F(K). It is, in fact, a finite subcover of U. Since we can do 
this for every open cover of F(A), we have proved that F(A‘) is compact. a 


A function F': D --> R? is said to be bounded on D if there is a number M such 

that 
\[F(a)|| <M for all we D. 

That is, F is bounded on D if the set of non-negative numbers {||F(«)||: «2 € D} 
is bounded above. The least upper bound of this set is denoted supp ||F(x)||. It 
may or may not be a member of the set — that is, there may or may not be a point 
xo € D such that ||F(2o)|| = supp ||/F()||. If there is such a point xo, then we say 
that ||F(2)|) assumes a maximum value on D. 


‘A compact set contains points of maximal norm and points of minimal norm 
(Exercise 7.4.4). Combining this with the previous theorem yields the following: 


Theorem 8.2.4. If K CR? is compact and F : K -+R? is continuous, then F is 
bounded on K and ||F(x)|| assumes a maximum value on Ki. 


Proof. By the previous theorem, F(J’) is compact and, hence, bounded. Further- 
more, it contains a point of maximum norm by Exercise 7.4.4. This point is in 
F(K) and so it has the form F(se9) for some ao € K’. a 


Corollary 8.2.5. If h CR? is compact and f : Nv --+ R is a continuous real-valued 
function on Ky, then f assumes a maximal value and a minimal value on K. 


Proof. Since At is compact, the previous theorem implies that {|f(x)| : ¢ € A} 
is bounded above by some number M. Then the function g(x) = f(a) + M isa 
non-negative function and so |g(x)| = g(a). By the previous theorem, there is a 
point xo € A’ with 

g(x) < glo) forall xe K. 


g(x)—M, it follows that xo is a point at which f achieves its maximal 


Since f( 


value. 


Since the above argument applies equally well to — f(r) and since a maximum 
for — f(a) on A’ will be the negative of a minimum for f(«) on Ky, it follows that 


f(a) has a minimum value on A’ as well. a 


8.2. Properties of Continuous Functions 199 


Example 8.2.6. Let A’ be a compact subset of R?. Show that if f : A Risa 
real-valued continuous function on AY which is strictly positive at each point of 1’, 
then there is a number 6 > @ such that f(«) > 4 for all c € K. 


Solution: By Corollary 8.2.5, the function f has a minimum value 6 on K’. 
This minimum value cannot be @, since f is positive at all points of A’. Thus, 6 > @ 
and f(x) > 6 for alle € K. 


Continuity and Connectedness. Continuous functions also take connected sets 
to connected sets. 


Theorem 8.2.7. If D C R? is connected and F :; D + R4 is continuous, then 
F(D) is also connected. 


Proof. Suppose U and V are open subsets of R? such that F(D) C UUV and 
(UN F(D))A(VAF(D)) = 9. Then F~1(U) and F~1(V) are relatively open subsets 
of D, F-'\(U) F-\(V) = 0, and Dc F-1(U) UF~1(V). Thus, one of the sets 
F-1\(U)ND and F~1(V) 9D must be empty since, otherwise, they would separate 
D. However, if F~!(U) D = 0, then UN F(D) = 0 and a similar statement holds 
for V. Thus, either U or V has empty intersection with F(D), which implies that 
the two sets do not separate F(D). Hence, F(D) is connected. Oo 


The following is the several variable version of the Intermediate Value Theorem, 
since it say 
two values, it also takes on every value in between the two. 


that if a continuous real-valued function on a connected set takes on 


Corollary 8.2.8. If D C R? is connected and f : D > R is a continuous function, 
then f(D) is an interval. 


Proof. By the previous theorem, f(D) is a connected subset of the line R. By 
Theorem 7.5.4 the only such sets are intervals. a 


Now suppose EF is a subset of R¢ and 7: I + E is a parameterized curve with 
parameter interval I = [a,]. Since J is connected by Theorem 7.5.4, its image (1) 
is a connected subset of E. Thus, if x =>(a) and y = 7(b), then x and y must be 
in the same component of E. Thus, we have proved the following. 


Theorem 8.2.9. If E is a subset of R4 and x and y are points of E that may be 
joined by a curve in E, then x and y are in the same connected component of E. 
If each pair of points of E may be joined by a curve in E, then E is connected. 


Example 8.2.10. Show that the unit circle T (the set of points (x,y) € R? with 
x? +y? = 1) is connected. 


Solution: Each point on the circle T is of the form (cost, sind). Each pair of 
such points (cosa, sina) and (cos, sinb) with a <b are joined by the curve 


y(t) = (cost,sint), €€ [a,b], 


which lies in the circle. Hence, the circle T is connected. 


8. Functions on Euclidean Space 


Uniform Continuity. 


Definition 8.2.11. Let D be a subset of R? and let F': D — R® be a function. 
Then F is said to be uniformly continuous on D if for each € > @ there isa d > @ 
such that 


\|F(x) — F(y)||< © whenever 2,y€D and ||x—yl| <0. 


As with uniform continuity for functions of one variable, discussed in Section 
3.3, the point here is that the choice of 6 does not depend on a or y. 
Uniform continuity is an important concept and it will play a key role in our 
proof of the existence of the Riemann integral of a function of several variables. 
We proved in Theorem 3.3.4 that a continuous function on a closed, bounded 
interval is uniformly continuous. The analogous theorem holds for functions of 
several variables, but compact sets replace closed, bounded intervals. 


Theorem 8.2.12. If K is a compact subset of RP and F ; K -+ R® is continuous 
on K, then F is uniformly continuous on K. 


Proof. Since F is continuous on K’, given ¢ > @, we may choose for each « € Ka 
number (r) > @ such that 


(8.2.1) \|F(y) -— F(a)|| <«/2 whenever ye K and |ly—2|| < d(x). 


We set. p(x) = 6(«)/2. Then p(x) is a positive, real-valued function defined on K, 
just as in Example 7.4.9. In that example, we showed that a consequence of the 
compactness of A’ is that there is a finite set of points {a1,2,-..,a%} such that K 


is contained in the union of the balls Bye,)(«j) for j = 1,...,n. 


We set p = min{p(x;) : j = 1,...,n}. Then given any two points 2,y € K 
with ||2—yl] <p, x must be in Byi,)(z;) for some j. This implies that ||a—2)|| < 
p(s) < 6(x;) and 


lly — 2yl| S [ly 2]| + [|e ~ 2j]| < e+ (ws) S 20(ay) = 62). 
Since both « and y are within 6(,) of #;, it follows from (8.2.1) that 
|F@) — FO)I| $ IF) — F(es)|| + ||F(@) — FI < /2 + €/2 = 6 


Hence, F is uniformly continuous on K. a 


In Theorem 3.3.6 we showed that a function is uniformly continuous on a 
bounded interval if and only if it has a continuous extension to the closure of 
the interval. The analogous theorem holds for functions from R? to R@. 


Theorem 8.2.13. If D CR? is a bounded set and F : D -+ R4 is a function, then 
F is uniformly continuous on D if and only if F can be extended to a continuous 
function F :D + R?. 


Proof. Note that, since D is bounded, D is compact. Thus, if F has an extension 
to a continuous function F : D > R®, then F is uniformly continuous on D, by the 
previous theorem. Then F* is also uniformly continuous on the smaller set D. But 
F =F on D, and so F is uniformly continuous on D. 


8.2. Properties of Continuous Functions 201 


Conversely, suppose F is uniformly continuous on D. Then {F(an)} is a Cauchy 
sequence in R? whenever {z,} is a Cauchy sequence in D (Exercise 8.2.11). If 
x € D, then there is a sequence {x,} in D that converges to x (Theorem 7.3.10). 
Such a sequence is necessarily Cauchy and so {F(zn)} is also Cauchy. But Cauchy 
sequences in R? converge by Theorem 7.2.16. 
If {y,} is another sequence in D which converges to x, then we may construct 
a third sequence {2,} converging to x by intertwining the sequences {2} and {yn} 
that is, let zon = Yn and 297-1 = a@y. Then, {zn} not only converges to a, it 
has both {xn} and {y,,} as subsequences. By the above argument, the sequence 
{F(zn)} must converge to a point u € R?. Both subsequences {F(xn)} and { F(yn)} 
must then converge to the same point u. Thus, we have proved that no matter what 
sequence {x} converging to x we choose, the limit of the sequence {F(«n)} is the 
same. Therefore, it makes sense to define an extension F of F to D by setting 


F(x) = lim F(an) 


for any sequence {2} in D converging to x. The resulting function is obviously 
equal to F on D, since we may just choose x, = x for all n if x € D. 


We now have an extension F of F to D. It remains to prove that it is continuous 
on D. We will do this by applying Theorem 8.1.6. If {an} is a sequence in D 
which converges to « € D, we may choose for each n a point y, € D such that 


|len = Yn|| <1/n and ||F(yn) — F(xn)|| < 1/n. Then 


[le = Yall S [le — en |] + [fern — Ym |] < [l 


= 2n|| + 1/n. 


Since || —ap|| + @ and 1/n — @, it follows that y,, + x and, hence, F(yn) > F(a) 
by our definition of F. However, it also follows that F(«n) + F(x) since 


IF (@) = F(en)|| < ||F(@) — Fyn)|| + || P(e) — Fen) | 
and both ||F(yn) — F(xn)|| and ||F() — F(yn)|| converge to @ 


Since F(x) + F(«) whenever {x,} is a sequence in D converging to « € D, 
the function F is continuous on D by Theorem 8.1.6. Oo 


EEE SSS SS 
Exercise Set 8.2 


1. If A={(x,y) € R? 0 <2 < 1,0<y< 1}, which of the following sets cannot 
be the image of the set A under a continuous function F : A + R?? Justify 
your answers. 

a Bale.8): 
(b) Bi (@). 

(c) {(a,y) ER? :@<a2<1, @<y}. 

(d) 

(e) 


Bi (0,0) U Bi(3,@). 
) (tt) R?2tER @<t<i}. 


2. Finish the proof of Theorem 8.2.1 by proving that a function is continuous if 


and only if the inverse image of each closed set is relatively closed. Hint: You 
may use the first part of the theorem (that a function is continuous if and only 
if the inverse image of each open set is relatively open). 


202 


8. Functions on Euclidean Space 


10. 


11. 


12. 


13. 


. Is the sphere {(2,y, 2) € RS 


. If A is a compact, connected subset of R? and f : A’ — R is a continuous 


function, what can you say about f(A)? 


. If F_: R? — R¥ is continuous and A is a bounded subset of R?, prove that 


F(A) = F(A). Is this necessarily true if A is not bounded? 


. The image of a compact set under a continuous function is compact, hence 


closed, by Theorem 8.2.3. Is the image of a closed set. under a continuous 
function necessarily closed? Prove that it is or give an example where it is not. 


. Is the image of an open set under a continuous function necessarily an open 


set? Prove that it is or give an example where it is not. 


242422 = 1} connected? How do you know? 


. Prove that if f : T — R is a continuous real-valued function on the unit circle 


T = {(x,y) € R2 : x? +y? = 1}, then there is a pair of diametrically opposed 
points («,y) and (—«,—y) on T at which f has the same value. 


. Find an example of a closed set A C R? which is connected but which contains 


two points that cannot be joined by a curve in A. 


Is the function f : R? \ {(2,@)} + R defined by 


a 


On Goa 


uniformly continuous on By(@,0)? Is it uniformly continuous on By(@,0)? Jus- 
tify your answers. 

If D CR, prove that if a function F ; D > R® is uniformly continuous on D, 
then {F(2n)} is a Cauchy sequence in R4 whenever {cn} is a Cauchy sequence 
in D. 


Show that the converse of the statement in the previous exercise is not true in 
general, but it is true if the set D is bounded. That is, show that there exist 
a D and a continuous function F : D + R? which is not uniformly continuous 
but which does take each Cauchy sequence in D to a Cauchy sequence in R?. 
However, show there are no such functions if D is bounded. 


Does uniform continuity make sense for a function from one metric space to 
another? If so, how would you define it? 


8.3. Sequences of Functions 


Uniform convergence of sequences of functions will play the same role in functions 
of several variables that it did in earlier chapters on functions of a single variable. 
It preserves continuity and allows the limit to be taken inside an integral. 


The results of Section 3.4 on uniform convergence hold in the several variable 


context and have almost the same proofs. 


8.3. Sequences of Functions 203 


Uniform Convergence. 


Definition 8.3.1. Let {F,} be a sequence of functions from D to R%, where DC 
IR®. We say this sequence converges pointwise to F : D + R? on D if the sequence 
{F,(«)} converges to F(x) for each « € D. 

We say {Fi} converges uniformly to F : D + R¢ on D if, for each ¢ > 0, there 
is an N such that 


lF(@) 


F,(a)||<e€ whenever x€D and n>N. 


The difference between pointwise and uniform convergence is that, in the latter, 
the choice of N must be independent of «. 

The following test for uniform convergence is the several variable analogue of 
Theorem 3.4.6. The proof is simple and is left to the exercises. 


Theorem 8.3.2. Let F be a function and let {F,,} be a sequence of functions defined 
on a set D C RP? and having values in R°. If there is a sequence of non-negative 
numbers {bn}, such that bn + 0 and 


||F(@) = Fa(a)|| < bn forall xD, 

then {F,} converges uniformly to F on D. 
Example 8.3.3. Examine the convergence of the sequence {(x? + y?)"} on the 
closed disc B,.(0,0) in R? for each r < 1. 

Solution: Note that x? + y? <r? on B,(0,0). Thus, 

\(w? + y?)"] <r?" on B,(0,0). 

Ifr <1, then r?” + O and, hence, {(2?+y")"} converges uniformly to 0 on B,(0,0) 
by the previous theorem. 

On B,(0,0), the sequence {(x? + y?)”} converges to 0 if (x,y) is in the interior 
of the disc and it converges to 1 if (x,y) is on the boundary of the disc. The limit 
function is not continuous on By(0,0) and, by the next theorem, this means the 
convergence is not uniform. Without using the next theorem, we can still easily see 
that the convergence is not uniform ~ in fact, not uniform even on the smaller set 
By (0,0). Given an € with 0 < € < 1, if (x,y) € By(0,0) and we set r = ||(x,y)|| < 1, 
then |(x? + y?)"| =r?" and so 
(8.3.1) l(a? + y?)"| <e€ 
if and only if r2" < €, which holds if and only if 

Ine 
2inr’ 


Thus, an N with the property that (8.3.1) holds for all r < 1 must be larger than 
N, for all r < 1. There is no such N, since lim, 1 N, = 00. 


n>N,= 


Uniform Convergence and Continuity. One of the main reasons why uniform 
convergence is important is the following theorem. Its proof is the same as the 
proof of the analogous theorem for real-valued functions of a real variable (Theorem 
3.4.4), and we will not repeat it. 


8. Functions on Euclidean Space 


Theorem 8.3.4. If {F,} is a sequence of continuous functions from a subset D 
of RP to R%, which converges uniformly on D to a function F, then F is also 
continuous on D. 


‘As we saw in Example 8.3.3, a sequence of continuous functions which converges 
only pointwise may not converge to a continuous function. 


Example 8.3.5. Define a sequence {F,} of functions from the unit ball Bi(@,@) 


in R? to R? by 
F,( ) we ny? ne 
n(, = Ter ESAP Sh RPE ih leg 
¥ T+ny2?1+na2 


Show that this sequence converges pointwise but not uniformly on By (@,@). 


Solution: Each of the functions F, is continuous on By(@,@). The sequence 
clearly converges pointwise to the function F defined on By(@,@) by 


This function is not continuous on By (®, @) ~ in fact, it is discontinuous at all points 
on the #- and y-axes ~ and so, t 
to F cannot be uniform on By(@ 


the previous theorem, the convergence of {Fy} 


Uniformly Cauchy Sequences. 


Definition 8.3.6. If D C R? and if {Fy} is a sequence of functions from D to R1, 
then {F”} is said to be uniformly Cauchy if, for each € > @, there is an N such 
that 

||Fa(x) — Fin(x)|| <€ whenever x € Dand n,m > N. 


Another several variable analogue of a singe variable theorem (Theorem 3.4.10) 
is the following, Since the proof of the single variable version was left to the 
exercises, we will actually prove this version. 


Theorem 8.3.7. If D C R?, a sequence of functions F,, : D — R4 is uniformly 
Cauchy if and only if it converges uniformly to some function F : D> R4. 
Proof. If {F,} converges uniformly on D to a function F and if ¢ > @, then there 
is an N such that 

\|F (2) — Fa(x)|| < €/2 whenever x€D,n>N. 
Then 


||Fa(x) — Fin(x)|| <||Fa(x) — F(x)|| + ||F(x) - Fin(a)|| < €/2 + €/2 =€ 
whenever « € D and n,m > N. Thus, {Fy} is uniformly Cauchy. 
On the other hand, if {Fy} is uniformly Cauchy, then for each « € D, {Fy(cx)} is 
a Cauchy sequence of vectors in R# and, hence, converges to some vector F(a) € R4 


by Theorem 7.2.16. That is, {F,} converges pointwise to a function F : D + R4. 
It remains to prove that the convergence is uniform. 


8.3. Sequences of Functions 205 


Since the sequence is uniformly Cauchy, for each € > @ there is an N such that 
||Fa(z) — Fm(z)|| <€/2 whenever «€ Dandn,m>N. 
Ifm>n>N we have 
|LP(e) = Fax) || $ ||P(e) = Fno(e)|| + ||Fin(e) = Fa(e)|| < |[Fn() = F(e)|| + €/2. 


The left side of this inequality does not depend on m and the right side holds for 
all m > n. For each x € D, lim||F(«c) — Fin(a)|| = @. Hence, on taking the limit, of 
the above inequality as m—+ 00, we conclude that 


||F(x) — Fala)|| <€/2<e€ forall «€ Dandn>N. 
This proves that {Fy} converges uniformly to F on D. a 


The Sup Norm. If D is a compact subset of R?, each continuous function F 
from D to R4 is bounded, by Theorem 8.2.4. That is, supp ||F(z)|| is finite and, in 
fact, ||F(x)|| actually assumes this value at some point of D. We set, 


[|Fl|p = sup ||F(2)|]- 
D 
This is a norm on the vector space of all continuous functions from D to Rt. 


Example 8.3.8. Find ||>||7 if J is the interval [0,7] and 7 : J + R? is the curve 
defined by 

(t) = (cost, 1 + sint). 
We have 


|l()|| = cos? t + (1 + sint)? = V2+ 2sint. 
This attains its maximum value on [0,7] at t = 7/2, where it has the value 2. Thus, 
lvl = 2. 


Theorem 8.3.9. If D is a compact subset of R? and {F,} is a sequence of con- 
tinuous functions from D to R4, then {Fx} converges uniformly to a function 
F:D—R¢ if and only #f limy-s0 ||F — Ful|p = ©. 


Proof. Given any ¢ > @ and any n, the inequality || F(.c) — F,(«)|| < © holds for 
all x € D if and only if ||F — F,||p < ¢. Thus, {F,} converges uniformly to F if 
and only if limps. ||F — Fallp = @ a 


The space €(K’;R4) of all continuous functions on a compact set KC RP with 
values in Rf is a vector space under the operations of pointwise addition and scalar 
multiplication of functions. If we define the norm of an element F of this space 
to be the sup norm ||F'||;, then it is easy to see that C(K;R4) is a normed vector 
space (Exercise 8.3.11). In particular, it is a metric space in which the distance 
F —G|x- Tt turns out that this 
is a complete metric space (meaning that all Cauchy sequences converge). 


between two elements F and G is defined to be 


Theorem 8.3.10. The normed vector space C(K';IR4) is complete. 


Proof. A Cauchy sequence in @(K;R4) is by definition a sequence of continuous 
functions which is Cauchy in the metric defined by the norm ||-||;. Such a sequence 
is uniformly Cauchy on A’. By Theorem 8.3.7 such a sequence converges uniformly 
on K’. The limit function is continuous, by Theorem 8.3.4. By the previous theorem, 


8. Functions on Euclidean Space 


the sequence converges in the metric defined by |] - |x to this limit. Thus, each 
Cauchy sequence in the metric space (A; R¥) converges to an element of C(K; R®) 
and, hence, this space is complete. a 


Series of Functions. Given a series 
oo 

(8.3.2) SOF.) 
=1 


whose terms Fy are functions from a domain D C R? into R®, we define its associ- 
ated sequence of partial sums {S,} in the usual way: 


Sn(x) = 3° F(z). 
k=1 


The series converges pointwise if its sequence of partial sums converges pointwise. 
It converges uniformly on D if its sequence of partial sums converges uniformly on 


D. 


‘As in the single variable case, there is a simple condition (the Weierstrass M- 
test) which ensures that a series converges uniformly. The proof is the same as the 
proof of Theorem 6.4.4 and so we will not repeat it. 


Theorem 8.3.11 (Weierstrass M-test). If there is a convergent series of non- 


negative numbers 
2 
SM, 
k=l 


such that ||Fi(x)|| < Mg for all k and all « € D, then the series (8.3.2) converges 
uniformly on D. 


Example 8.3.12. Show that the series 
1 

(8.3.3) s ja sin ke cos ky 
k=l 


converges uniformly on R?. 


Solution: Since 


1 1 
prsinkecosky| < fp for all k,a,y 


and the series 37%. 1/k? converges (it’s a p-series with p = 2), the Weierstrass 
M-test tells us that the series (8.3.3) converges uniformly on R?. 


1. Show that the sequence {7n(t)}, where 


Oe (ar x): 


does not converge uniformly on [@, 1]. 


8.4. Linear Functions, Matrices 207 


a 


10. 


uae 


12. 


13. 


; 0 
. Does the series > 


. Does the series 7% 


. Show that the sequence {An(t)}, where 


t t 
An(t) = (ae x) , 


does converge uniformly on {@, 1]. 


. Does the sequence {(k~| sinka,k~! cos ky)} converge pointwise on R?? Does 


it converge uniformly on R?? Justify your answers. 


. Does the sequence {sin(x/k),cos(y/k)} converge pointwise on R22 Does it 


converge uniformly on R2? Justify your answer. 


. Find ||F||p if D = {(x,y) € R2: 2? + y? < 1} and F: R? + R? is defined by 


F(ax,y) = («+1,y+1). 


. Find |[9|[7 if J = (0, 7] and 7: J  R? is defined by 


7(t) = (2cost, 3sint). 


. Prove that if {F,} is a sequence of bounded functions from a set D C RP into 


R! and if {F,} converges uniformly to F on D, then F is also bounded. 


a*y* converge uniformly on the square 


{(x,y) €R?: -1<2<1,-l1<y<1}? 
Justify your answer. 
w*y* converge uniformly on the disc 


{(z,y) € R?: 2? +y? <1}? 


Justify your answer. 
Does the series 3224(z", (1 — 
verge pointwise on (@, 1)? On which subsets of (@, 1) does it converge uniformly? 
Justify 
If K’ is a compact subset of R?, show that ||- ||, is a norm on the vector space 
€(K;R4) of continuous functions on K’ with values in R4. 


)") converge pointwise on [0,1]? Does it con- 


your answers. 


Prove that if D is a subset of R? and {F,} is a sequence of functions from D to 
R4, then {F,} fails to converge uniformly to @ if and only if there is a sequence 
{xn} in D such that the sequence of numbers {Fy(an)} does not converge to 


If KC R@ is compact, show that a series 7%, F,(a) of functions from K to 
R¢ converges uniformly on K if the series of numbers 7%, ||F||x converges. 


8.4. Linear Functions, Matrices 


Other than constants, linear functions are the simplest functions from R? to R?. 
For example, the linear functions from R to R are the functions of the form 


L(x) =ma, 


where m is a constant ~ that is, they are functions whose graphs are straight lines 
through the origin. In this section we introduce and study linear functions between 


8. Functions on Euclidean Space 


Euclidean spaces. In the next chapter we will show how to use linear functions to 
approximate more complicated functions. 


Linear Functions. 


Definition 8.4.1. A function L : R? + RQ is said to be linear if, whenever 
ay ER? andaeR, 

(a) L(x + y) = L(x) + L(y) and 

(b) L(ax) = aL(c). 


Linear functions are often called linear transformations or linear operators. 
Combining (a) and (b) of this definition, we see that a linear function preserves 
linear combinations of vectors. That is, 


(8.4.1) L(ax + by) =aL(«) + bL(y) 


for all pairs of vectors «, y € R? and all pairs of scalars a,b. An induction argument 
shows that the analogous result holds for linear combinations of more than two 
vectors. 


Note that, since the definition uses only addition and scalar multiplication, 
linear functions between any two vector spaces may be defined in the same way as 
linear functions between R? and R?. 


Example 8.4.2. Determine whether the functions F, G from R® to R? and H 
from R? to R are linear, where 


F(x,y) = (22 +y,2—y), 
G(x, y) = (2? 
cia ee; 


(x,y) A (0,8), 
e if (x,y) = (0,8). 
Solution: The function F is linear since, given two vectors u = («1,y,) and 
v = (ag, yp) in R? and a scalar a, we have 
F(u+v) = F(x: +22, 41 + y2) 
= (2(a1 + €2) + (yi + y2), (1 + #2) — (m1 + ¥2)) 
= ((201 + yr) + (2e2 + y2), (1 — yx) + (@2 — y2)) = F(u) + F(v) 
and 
F(au) = F(ax,,ay;) = (2(ax1) + ay, a2 — ay1) 
= (e221 +91), (a1 — y1)) = oF (u). 
The function G is not linear since, if u = (1,0), then 
G(2u) = ((2)?, 2) = (4,2); 
while 
2G(u) = 2(1?, 1) = (2,2). 


These are not equal and so (b) of the above definition does not hold for G. 


8.4. Linear Functions, Matrices 209 


The function H is also not linear. If u = (1,0) and v = (0,1), then 
H(u) = H(v) = H(u+v) =1. 


Thus, H(u+v) ¢ H(u) + H(v) and (a) of the definition does not hold (note that 
(b) does hold for this function). 


Linear Functions and Matrices. Recall that each vector x € R? may be written 
as a linear combination of the vectors e;, where 


ej = (0,-.-,0,1,0,...,0) 
with the 1 in the jth place. Specifically, 


(8.4.2) 


where 2) is the jth component of the vector «. 


If we apply a linear function L : R? —+ R? to the vector « and use the fact that 


linear functions preserve linear combinations, we conclude that 


- 
L(x) = )~ xjL(e)). 
k=1 


The vector L(e;) € RY has ith component e; - L(e). If we set 
(8.4.3) aj; =e: - L(e;), 


then the ith component y; of the vector y = L(«) is 
‘ 

SY aize;. 

= 


The numbers (aij), appearing in (8.4.4), form a q x p matrix — that is, a 
rectangular array 


(8.4.4) Yi 


@11 412 *** Qip 
a1 422 -** Ad» 
Agi Aq2 °°" gp 


with q rows and p columns. The equation y = L(x) can be expressed in vector- 
matrix notation as 


Yl (an @12 +++ Gp xy 

Yo B91, gsr ee Gon 2 
(8.4.5) “Ja]- Bll act, i 5 

Yq \on agar Aapy \Gp: 


In this notation, the vectors x and y are written as column vectors. The expression 
on the right is the vector-matrix product of the matrix A = (a,j) and the vector 
x = (aj). It is defined to be the vector whose ith component is the inner product 


of the ith row of A with the vector 2. 


210 8. Functions on Euclidean Space 


At this point, we have shown that, to each linear function L : R? + R9, there 
corresponds a q X p matrix A such that 


L(x) = Az, 
where Ac is the vector-matrix product of A with «, as in (8.4.5). On the other hand, 


each q X p matrix A determines a linear function in this way, since vector-matrix 
multiplication satisfies 


A(c-+y) =Ax+ Ay and A(cr) = c(Az), 
for every pair of vectors x,y € R? and every scalar c € R (Exercise 8.4.11). 


Note that, in the correspondence between a linear function L and its matrix 
A, the jth column of A is the vector L(e;). The following theorem summarizes the 
above discussion. 


Theorem 8.4.3. A function L : R? > R® is linear if and only if there is ag x p 
matrix A such that 
L(x) = Aa forall «ER. 
Example 8.4.4. If a function L from R° to R° is defined by 
L(z,y,z) = (© + Qy — z,yt+2,3e-—y+z), 
then is L linear? If so, what matrix represents it? 


Solution: If we write L(2,y, 2) as a column vector, then it clearly is given by 


e+ Qy—2 162 -1\ f/x 
L(x,y,2) = ytez =-[:0% 1 y 
3a —y tz Bod Ip he 


Since L is given by a matrix through vector-matrix multiplication, it is linear by 
Theorem 8.4.3. 


Matrix Operations. The sum L + M of two linear functions L : R? > R¢@ 
and M : R? — R? is defined pointwise, as is the sum of any two functions with a 
common domain. That is, (L+M)(x) = L(x) + M(x). The function L + M is also 
a linear function since 

(L+M)(a@+y) =L(e@+y)+M(at+y) 
= L(x) + L(y) + M(x) + M(y) = (L+ M)(a) +(L+M)(y), 
for all 2,y € R?, and 
(L+ M)(ax) = L(ar) + M(ax) = aL(«) + aM (a) = a(L+M)(z), 
for all a € RP andaéR. 


Similarly, the product cL of a scalar ¢ with a linear function L is defined by 
(cL)(«) =eL(z). This is also, clearly, a linear function. 

If M: RP + R@ and L : RY > R° are linear functions, then the composition 
LoM:RP? +R is defined, where 

Lo M(x) = L(M(a)). 
This is also a linear function since 
(Lo M)(e +y) = L(M(w + y)) = L(M(x) + M(y)) 
= L(M(x)) + L(M(y)) =LoM(x)+LoM(y), 


8.4. Linear Functiens, Matrices 211 


for all x,y € RY, and 
LoM(ax) = L(M(ax)) = L(aM(«)) = aL(M(x)) = aLo M(x), 
for all x € R? and allaeR. 

In view of the above, it is natural to ask, for linear functions L and M repre- 
sented by matrices A and B, what are the matrices representing L + M, cL, and 
Mo L? The answer is given in the next two theorems. They have simple proofs 
based on the fact that, if the matrix A represents the linear function L, then the 
jth row of A is L(es) (this is just equation (8.4.3). The details are left to the 
exercises. 


Theorem 8.4.5. [f L : R? + R4 and M : RP + R4 are linear functions represented 
by matrices A = (aij) and B = (bij), respectively, and if c € R, then L+M and 
cL are represented by the matrices 


A+B= (aij +bij) and cA = (cais). 


These are the usual operations of addition and scalar multiplication of matrices. 
The entry in the ith row and jth column of A+ B is aij + bij, while that of cA is 
cais. 

Theorem 8.4.6. [f L:R? + R° and M : RP + R® are linear functions represented 
by matrices A = (aij) and B = (bjx), then Lo M : RP -+ R® is represented by the 


matri« AB = (cix), where 
q 


cik = Yo aijbjn. 


j=l 


This is the usual operation of matrix multiplication. The entry in the ith row 
and kth column of AB is the inner product of the ith row of A with the kth column 
of B. 


Example 8.4.7. If A= G : ci and B= ( ; a then find 2A — B. 


Solution: We have 
2-@ 4-1 -2-3 2: 3 <5 
ees ey 2-@ 2-1 ees a4 i 
The transpose At of a matrix A is the matrix obtained by interchanging the 
rows and columns of A. That is, if A = (aij), then At = (bij), where bij = aji. 


Example 8.4.8. If A is the matrix of the previous example, then find At, AA‘, 
and At A, 


Solution: By definition, we have 


1 6 
Ab SD Tle. 
=I) 1 


while 


212 8. Functions on Euclidean Space 


and 


1 
AA=(2 1 F : fa 50 Bhs 
tae 


Norm of a Linear Transformation. 


Definition 8.4.9. A linear transformation L from a normed vector space X to a 
normed vector space Y is said to be bounded if the set 


(8.4.6) {ae reX ed | 


llell 


is bounded above. In this case, the least upper bound of this set is called the 
operator norm of L and is denoted ||L||. 


Equivalently, a linear transformation L is bounded if there is a number B such 
that 
||L(x)|| < Bl|e|| for all. «eX, 


The operator norm ||Z|| of L is the least such number B. 


Theorem 8.4.10. [f X and Y are normed vector spaces, then every bounded linear 
transformation L : X + Y is uniformly continuous on X. 
Proof. If x1, 22 € X, then 
[IL(e1) ~ L(w2)|| = ||L(e1 — 22)]) < ||Lll lea — 22ll- 
Hence, given € > @, if we choose 6 = €/||L||, then 
|Z (x1) — L(x2)|| < ||Z|| ||e1 — v2||< © whenever |x, — 29|| < 6. 
This shows that L is uniformly continuous on X. o 


Theorem 8.4.11. Every linear transformation from L : R? — R® is bounded and, 
hence, uniformly continuous. Furthermore, 
1/2 


2 
Wi <{ Sa} 
oo 
where A = (aij) is the matrix which determines L. 


Proof. Let A be the matrix which determines L and let r; be the ith row of A, 
Then the ith component of y = L(x) = Az is the inner product y; = ri +2. By the 
Cauchy-Schwarz inequality (Theorem 7.1.8) 


lye S [rslllell- 
Thus, 
1/2 12 
WEC@) = (yh og) S (eal #2 + Ura PY Wel 
1/2 


— Sail? Ilell- 
ij 


8.4. Linear Functiens, Matrices 213 


1/2 


This implies that L is bounded and ||L||< [J >fay|?] im] 
y 


Inverse of a Matrix. Of particular interest in matrix theory are square matrices 
that is, p x p matrices for some p. The product of two p x p matrices is another 
p X p matrix and so the set of p x p matrices is closed under multiplication. 

There is a multiplicative identity J in the set of p x p matrices. This is the 
matrix J = (6;;) where 6); = 1 if i = j and 6;; = @ otherwise. It has the property 
that 

AI=IA=A, 
for any p x p matrix A. 
If A is a p x p matrix, then an inverse for A is a p x p matrix AW! such that 
Ate AAS 
By Cramer's Rule, a square matrix has an inverse if and only if its determinant 
det A is non-zero and, in this case, 
1 
She ‘Aye 
det at y 
where A® is the matrix of cofactors of A — that is, A° = ((—1)'*/ det Ajj), where 
Ajj is the (p—1) x (p—1) matrix obtained by deleting the ith row and jth column 
from A. 


A matrix is s 


id to be non-singular if it hi 
is non-zero. A square matrix is singular if it fails to have an inverse. 


m inverse, that is, if its determinant 


Note that if L : R¢ + R¢ is a linear transformation with matrix A, then A has 


AL 


an inverse matrix A~! if and only if L has an inverse transformation L~!. In this 


case, the linear transformation L~! has A~! as its associated matrix. 


Example 8.4.12. Let 


2c el 2-1 
Cae ee: 
For each of A and B, determine if the matrix has an inverse and, if it does, find it. 


Solution: The matrices A and B have determinants 
detA=2+1=3 and detB=2-2=8€. 


Thus, A has an inverse and B does not. By Cramer’s Rule, the inverse of A is 


s(t 2)=(is 2p): 


Remark 8.4.13. In what follows, we will often ignore the difference between a 
linear function L and the matrix which represents it. They are not exactly the 
same. The matrix of a linear transformation depends on a choice of coordinate 
ems in R? and R9, while the linear transformation is independent of the choice 
of coordinates. To ignore the distinction will not cause problems as long as we 
stick with one coordinate system in each space. There will, however, be occasions 
where we change coordinate systems in R? or R@ or both while dealing with a given 


214 


8. Functiens en Euclidean Space 


linear transformation. It should be understood that the matrix corresponding to 
the linear transformation will, as a result, also change. 


ee perry 
Exercise Set 8.4 


The first five exercises involve the matrices 


Fone 


a 


3-1 2 5 
a=( i a=( if e=(4 -6 
ae. -2 2 Ap Sh 


. Find 24 + B, A— B, AB, and BA. 
. Find det A and det B and A~! and Bo}. 
. Find CD and DC. 


. Based on the result of the previous exercise, can you tell what (CD)? is without 


doing any further calculation? 


. Find det CD. 


6. Is the function F : R? > R? defined by F(x,y) = (a + y, vy) a linear transfor- 


_ 


10. 


11. 


16. 


mation? If so, what is its matrix? 


. Is the function F : R? + R? defined by F(«,y) = («+ y,a— y) a linear 


transformation? If so, what is its matrix? 


. Is the transformation of R? to itself which rotates every vector through an angle 


8 (counterclockwise rotations have positive angle and clockwise rotations have 
negative angle) a linear transformation? If so, what is its matrix? 


. What is the matrix for the linear transformation of R? which reflects each point 


through the diagonal line y = « (this transformation interchanges the « and y 
coordinates of each point). 


Find a linear transformation L ; R° + R° such that L(1,2,1) = (1,2, 1) and 
L(u) = @ for every vector u € R° which is orthogonal to (1,2, 1). 


Prove that if A is a q x p matrix, then 
A(a+y) = Ax+ Ay and A(cr) = c(Az), 


for every pair of vectors x,y € R? and every scalar c € R. 


. Prove Theorem 8.4.5. 
. Prove Theorem 8.4.6. 


. Prove that if A’ and L are linear transformations from R? > R‘, then 


IA + 2] < ||| + ZI 


. Prove that if A: R? — R@ and L : R¢ + R” are linear transformations, then 


Lo |S ||E|| {Al 


Prove that the operator norm of a p x p diagonal matrix has norm equal to the 
largest absolute value of the elements on the diagonal. 


8.5. Dimension, Rank, Lines, and Planes 215 


8.5. Dimension, Rank, Lines, and Planes 


‘A vector space X has finite dimension if it contains a finite set {ay,29,...,% 
of vectors which span X ~ that is, every vector in X is a linear combination of 
the vectors x;. If this set is also linearly independent, meaning the only linear 
combination of the vectors a; that equals @ is the one in which all coefficients are 
zero, then the set {2r1,22,...,2x} is called a basis for X. In this case, each element 
of X is a unique linear combination of the vectors xj. Every finite-dimensional 
vector space XY has a basis. In fact X has many bases, but each of them has the 
same number of elements. This number is called the dimension of X and is written 
dim(X). 

A subset M of a vector space X is called a linear subspace if it is closed under 
addition and scalar multiplication — that is, c+y € M and ax € M whenever «, y € 
M anda € R. It follows that a linear subspace M of a vector space is itself a vector 
space, with addition and scalar multiplication in M defined in the same way they 
are defined in X.. If X is finite dimensional, then so is the subspace M and any basis 
{21,22,...,2m} for M can be expanded to a basis 
for X. Thus 


11B25 +++ 5m Pm41, +++; 2n} 


dim(A) < dim(X). 


The set {e1,...,€p} is a basis for R?, where recall that e; is the p-tuple which 
has 1 for its jth component and @ for all the others. However, this is not the only 
basis for RP. 


Example 8.5.1. Show that the vectors u = (1,@,1), 0 = (1,1, 
form a basis for R®. 


,and w = (@,1,1) 


Solution: Consider the vector equation 
(8.5.1) au+ bu +cw = y. 


To show that {u,v,w} spans R°, we must show that this equation has a solution 
for every y. To show that {u, v, w} is a linearly independent set, we must show that 
if y = @, then this equation has only the zero solution for (a,b,c). Taken together, 
these two statements mean that equation (8.5.1) should have a unique solution for 
every y € R®. The vector equation (8.5.1) is equivalent to the system of linear 
equations 


at+b+® Yo 
O+b+e Y2, 
at@O@+ce = ¥, 


which, in turn, may be written as the vector matrix equation 


11 0\ fa ‘yi. 
e 11) fol=(m 
101) \e Ys 


The matrix in this equation has determinant 2 and so the matrix has an inverse. 
This implies that the equation has a unique solution (a,b, c) for each y = (y1, yo, ys) 
and, hence, that {u, v,u} is a basis for R°. 


216 8. Functions on Euclidean Space 


Definition 8.5.2. If L : X + Y is a linear transformation between vector spaces, 
then the image of L, denoted im(L), is the set 
L(X) = (L(x): 2 € X}, 
while the kernel of L, denoted ker(L), is the set 
{a EX: L(x) =0}. 


Since L is linear, it follows easily that its kernel and image are linear subspaces 
of X and Y, respectively. 


Theorem 8.5.3. If L: X + Y isa linear transformation between finite-dimensional 
vector spaces, then 


dim(ker(L)) + dim(im(L)) = dim(X). 


Proof. Let dim(ker(L)) =m and let {a1,2,...,am} be a basis for ker(L). We 
may expand this to a basis {71,22,...,€m,@m41,---;%n} for X. 

Set yj = L(am4j) for j = 1,...,n—m. Since every vector in X is a linear 
combination of the vectors @1,...,a@n and L(x) = @ for k =1,..., 
that every vector in im(L) is a linear combination of the vectors y1, 


set of vectors is linearly independent, since if 


m, we conclude 
+Yn—m. This 


ayyt + agyg + +++ + dn—mYn—m = 8, 


then a1tmy1+aotm42+++'+@n—mitn € ker(L). This implies that there are numbers 
bi,...,0m such that 


Q12m41 + a2@m42 +++ + an—mon = b,x) + boxe ++++ + bmIm- 
However, since {21,...,2n} is a linearly independent set, the aj’s and by’s must 
all be @. The fact that the a;’s must all be @ shows that the set {y1,...,Yn—m} is 
linearly independent and, hence, forms a basis for im(L). 

We now have dim(X) =n, dim(ker(L)) = m, and dim(im(L)) =n —m. Thus, 
dim(ker(L)) + dim(im(L)) = dim(X), as claimed. Q 
Definition 8.5.4. Let A be a@ Xp matrix and let L ; R? — R! be the linear trans- 
formation it determines. Then Rank(A) is defined to be dim(im(L)). Equivalently, 


by the previous theorem, it is also equal to dim(X) — dim(ker(L)). If L is a linear 
transformation whose matrix has rank r, then we will also say that L has rank r. 


A submatrix of a matrix A is a matrix obtained from A by deleting some of its 


rows and columns. 


The following is proved in most linear algebra texts. We won’t repeat the proof 
here. 


Theorem 8.5.5. The rank of a@ x p matrix A is r, where r Xx r is the dimension 
of the largest square submatrix of A with non-zero determinant. 


Example 8.5.6. What is the rank of the matrix 


8.5. Dimension, Rank, Lines, and Planes 217 


Solution: This matrix has 


(i 4) 


as a 2 x 2 submatrix with determinant —3. It has no square submatrices of larger 
dimension. Therefore, the matrix A has rank 2 


Example 8.5.7. What is the rank of the matrix 
In £2. 1 
B=|(2 4 2)? 
Te Sh 2, 
Solution: This matrix also has 
ily 22 
Let 


as a 2 x 2 submatrix with determinant —3, The only square submatrix of larger 
dimension is the matrix B and this has determinant @. Therefore, the matrix B 
also has rank 2. 


Affine Functions. 
Definition 8.5.8. An affine function F : R? — Ris a function of the form 


F( 


b+L (x), 


where b € R? and L : R? RY‘ is a linear function. The rank of an affine transfor- 
mation F is the rank of its linear part L. 


An affine subspace M of R? is a translate b + N of a linear subspace N of R?. 
In this case, the dimension of M is defined to be the dimension of N. 


The image of an affine function F(x) = b+ L(z) is an affine subspace b+im(L) 
that is, it is the translate b + im(L) of the linear subspace im(L). The dimension 
of this subspace is the rank of L. 


Similarly, if F(a) = b + L(x) is an affine function, then the set of solutions to 


the vector equation F(x) = @ is also an affine subspace. In fact, if a is one such 
solution (so that F(a) = b + L(a) = 


), then x is also a solution if and only if 
L(a-—a)=—-b+b=6. 


Hence, « is a solution if and only if « € a +ker(L). Thus, the set of solutions 
of the vector equation F(x) = @ is the translate a + ker(L) of the linear subspace 
ker(L) of R? and, hence, is an affine subspace. The dimension of this subspace is 
p — Rank(L). 

In general, if M =b+.N is an affine subspace, with N the corresponding linear 
subspace, then we will say that NV’ is the subspace of vectors parallel to the affine 
subspace M. 


218 8. Functions on Euclidean Space 


Lines in R°. Lines in R? are one-dimensional affine subspaces of R?. The above 
discussion suggests expressing them as either images of rank 1 affine transformations 
or as kernels of rank p — 1 affine transformations with domain R?. 


A rank | affine transformation y : R > R? has the form 
(8.5.2) y(t) =a + tu. 
The image of this transformation is a line which contains the point a = F(0) and 
is parallel to the vector u = 7(1) —7(@). 

On the other hand, given a line in R®, if we choose distinct points a and b on 
the line and set u = b—a, then the image of the affine transformation (8.5.2) is a 
line which contains both a = 7(@) and b = >(1) and, hence, is the line we started 
with. 

Thus, the lines in R? are exactly the images of affine transformations of the 
form (8.5.2). This situation is often expressed as a vector equation 

e=attu, 

which describes the points x on the line as the values assumed by the right side of 
the equation as ¢ ranges over R. This is a parametric vector equation for the line. 


In R®, such an equation takes the form (x,y,z) = (a1,a@2,a3) + t(u1, ue, us), 
Gihliclvis equivalent to the systemsat paramebtic equations 


e =a+lu, 
y =ag+tug, 
2 =azttu. 


Example 8.5.9. Find parametric equations for the line in R? which contains the 
point (1,0,@) and is parallel to the vector u = (—3, 4,5). 


Solution: A parametric vector equation for this line is 
(x,y,z) = (1,00) + «(—3, 4,5). 


The corresponding system of parametric equations is 


x =1-3t, 
yh At, 
z= dt 


Example 8.5.10. Find parametric equations for the line in R° containing the 
points (2, 1,1) and (5,—1,3). 

Solution: If we set wu = (5,—1,3) — (2,1, 1) = (3,—2,2), then the parametric 
equation for our line in vector form is 


(x,y, 2) = (2, 1,1) + £(3, -2, 2) = (2 + 84, 1 — 2t, 1 + 24). 


This can also be expressed as the system of parametric equations 


8.5. Dimension, Rank, Lines, and Planes 219 


To express a line in R? as the kernel of an affine transformation, we choose a 
point a on the line and a vector w parallel to the line (we may choose u = b—a 
where b is a point on the line distinct from a). If A is a matrix whose rows form a 
basis for the linear subspace 


{ye R?:y-u=@)}, 


then A is a (p— 1) x p matrix of rank p—1 and Au = @. This means that the 
kernel of the linear transformation determined by A has dimension 1 and contains 
u. Hence, this kernel is {tu :¢ € R}. The line {a + tu: t € R} contains a and is 
parallel to u. Thus, it must be our original line. By the construction of A, it also 
has the form 


{a ER? : A(x —a) =0} = {a € RP: Ar— 


0} where ¢ = Aa. 


Thus, our line is the kernel of the affine transformation F defined by F(a) = Ar—ce. 


If we apply the above discussion to R3, we conclude that the typical line in R° 
is the set of solutions (x,y,z) to an equation of the form 


Eo 
Vy v2 V3 _ fer 
Wy, we. ws Z ec)? 
where (v1, v2, 03) and (wi, wa, ws) are linearly independent vectors. In other words, 
it is the set of all simultaneous solutions of the pair of linear equations 


Ux + voy + 032 = ¢, 

wittwoytw3z = cp. 
Example 8.5.11. Express the line in Example 8.5.10 as the set of solutions of a 
pair of linear equations. 


Solution: We need to find two linearly independent vectors which are orthog- 
onal to u = (3, —2, 2). Such a pair is (2,3, @) and (2, 1,—2). If we apply the matrix 
with these two vectors as rows to the vector a = (2,1, 1), the result is 


2 


Gi 4) ()-G): 
Gi S)b)-@) 


This is equivalent to the pair of simultaneous equations 


Qe + By — nt 
Be +y — 22 3. 


Planes in R°. A plane in R? is a two-dimensional affine subspace of R? — that 
is, a translate of a two-dimensional linear subspace of R?. Such an object can be 
described as the image of an affine transformation of rank 2 or the kernel of an 
affine transformation of rank p — 2 with domain R?. 


220 8. Functions on Euclidean Space 


If wu and v are linearly independent vectors in R?, then they form a basis for a 
two-dimensional linear subspace of R?. If we translate this subspace by adding a 
to each of its points, we obtain a plane which contains a and is parallel to v and v. 
It consists of all points of the form 


r=atsut+ty; 
that is, it is the image of the affine transformation F : R? + R? defined by 
F(s,t)=a+su+tv. 
This is the vector-parametric form for the equation of a plane. 


In the case where p = 3, a vector-parametric equation of a plane has the form 
© ay ul Uy 
8 
or, when written as a system of equations, 


a, + su; +tv1, 
ag + sug + tve, 
= a3 + suz t tv3. 


x 
iy. 


Given three points a, b, c in R? which do not lie on the same line, the vectors 
u = b—a and v = c—aare linearly independent (Exercise 8.5.15). Hence, a, u, and 
v determine an affine function F with image a plane, as above. This plane contains 
the points a = F(0,@), b= F(1,@), and c= F(0, 1). 


Example 8.5.12. Find parametric equations for the plane that contains the three 
points (1,@,1), (1, 1,2), (—1,2,@). 

Solution: We choose a = (1,,1), wv = (1,1,2) — (1,@,1) = (@,1,1), and 
v = (-1,2,@)—(1,0,1) = (—2,2,—1). Then, according to the above discussion, the 
plane we seek has parametric equations 


ae — 2t, 
y= s+2t, 
2S bet, 


We can also express a plane in R® as the kernel of a rank 1 affine transformation 
from R° to R. If @ = (a1, a2, a3) is a fixed point in the plane, u = (x,y, 2) is the 
general point of the plane, and v = (v1, v2, v3) is a vector perpendicular to the plane, 
then v-(u—a) = @. Thus, the plane is the kernel of the affine transformation 
f :R° + R defined by f(u) = v-u—b, where b=v-a. The equation of the plane 
is then 

va + vey + u3z = b. 


Example 8.5.13. Find an equation for the plane of Example 8.5.12. 


Solution: We choose a = (1,0,1) as a point in the plane. Now we need a 
vector perpendicular to the plane. The vectors (@, 1, 1) and (—2,2,—1) are parallel 
to the plane and so we need to find a vector orthogonal to each of these. In fact, 
(3, 2, -2) is orthogonal to each of these vectors. Also, 


(3,2,-2)- (1,01) =1 


8.5. Dimension, Rank, Lines, and Planes 221 


Hence, an equation for our plane is 


3x + 2y —22=1, 


a 


10. 


dT. 


12. 


13. 


14. 


alee 
. What is the rank of the matrix | 2. 3 —1]? 
Ly ¢1, 


. Do the vectors (1,2, 1), (2,@, 1), and (1, -1, 1) form a basis for R°? Justify your 


answer. 


. Do the vectors (1,2, 1), (2,0, 1), and (0,4, 1) form a basis for R32? Justify your 


answer. 


1 


-2 


. What is the rank of the matrix (4 Be A: 


2 3 -6 


. What is the rank of the matrix ( ea i 


. Find parametric equations for the line in R? which contains the point (1,2,3) 


and is parallel to the vector (1,1, 1). 


. Find parametric equations for the line in R® containing both (1,1,1) and 


(3:21,3): 


. Express the line of the previous exercise as the set of simultaneous solutions of 


a pair of linear equations. 


. Find parametric equations for the plane that contains the three points (1,@,—1), 


(2,1,2), (-1,2,3). 


Express the plane of the previous exercise as the set of solutions of a linear 
equation. 

Find parametric equations for a line which passes through the origin and is 
perpendicular to the plane 2 — y + 3z = 5. Use this line to determine the 
distance from the plane to the origin. 


Find the distance from the line with parametric vector equation (x,y, 2) = 
(1+ 2t,2 —t,4 +t) to the origin. 


Find a formula for the point on the one-dimensional subspace of R? generated 
by a non-zero vector u which is closest to the point a € R?. 


Prove that, in R°, a plane and a line not parallel to it must meet in exactly 
one point. 


. Prove that if a, b, and c are three points in R? which do not lie on the same 


line, then the vectors u = b — @ and v = ¢—a are linearly independent. 


222 8. Functions on Euclidean Space 


16. Prove that if M is a linear subspace of R? and we set 
Mt={yeR?:yla@ forall x€M}, 
then M+ is also a linear subspace of R? and every vector in u € R? may be 
written in a unique way as u=2+y with 2 € M and y € M+ (sce Definition 
7.1.9). 


—Ssy 
Chapter 9 


Differentiation in 
Several Variables 


The most powerful method available for studying a function in several variables is 
to approximate it locally, near a given point, by an affine function. When this can 
be done, it provides a wealth of information about the original function. Affine 
approximation leads to the definition of the differential of a function of several 
variables. The differential of a function F, when it exists, is a matrix of partial 
derivatives of coordinate functions of F. For this reason, we precede the discussion 
of the differential with a brief review of partial derivatives. 


9.1. Partial Derivatives 


In this section, f will be a real-valued function defined on an open set in R?. 


Definition 9.1.1. The partial derivative of f with respect to its jth variable at 


ap) is denoted — (x) and is defined by 
rj 


(ABs ieesay Beh usla 


Dice, 


d 
qf lei ej 1,4,2541, 


provided this derivative exists. 


Thus, the partial derivative of a function f, with respect to its jth variable, at 
a point « in its domain is obtained by fixing all of the variables of f, except the jth 
one, at the appropriate values x1,...,2)—1,2)+1,---,2», then differentiating with 
respect to the remaining variable and evaluating at ;. 


Remark 9.1.2. When it is not necessary to explicitly exhibit the point « at which 
the partial derivative is being computed (because it is understood from the context 


or because « is a generic point of the domain of f), we will simply write 5 for 
ic 
the partial derivative of f with respect to its jth variable. 


223 


224 9. Differentiation in Several Variables 


Two other notations that are often used for the partial derivative of f with 
respect to 2) are f,, and f;. We won't use these in this text. 


Example 9.1.3. Find the partial derivatives of the function 


f(@1, 22,23, 04) = 2} + 2103 — dae}. 


Solution: To find any? we consider r2,273,x4 to be fixed constants and we 
ry 


differentiate with respect. to the remaining variable and evaluate at 21. The result 


is A 
alee): : 
On 1 + £3 
Similarly, we have 
of of of 2,2 
= -80 =e = — 120302. 
Bea : Beg"! Bea ant 


Example 9.1.4. Find the partial derivatives of the function 
f(x,y, 2) = 2? cosay. 
Solution: We have 
of 2 of 


—y2? sinry, apt sinry, = 2zcos xy. 


The Partial Der 
a function of one variable as the limit of a difference quotient, the result is 


atives as Limits. If we use the definition of the derivative of 


of See fie BA tp) FSi pers Bah cep) 
Bay eh ss) 2p) = Jim, 7 ‘ 


The notation involved in this statement becomes much simpler if we note that the 
point (1r1,...,2)-+h,...,arp) may be written as 2-+he,, where e, is the basis vector 
with 1 in the jth entry and 0 elsewhere. Then, 

a fle he) - fl 

2F (a) = tim SE + es) = Fle), 


Ox; hoo h 


(9.1.1) 


Higher-order Partial Derivatives. The partial derivatives defined so far are 
first-order partial derivatives. We define second-order partial derivatives of f in 


the: following fashions dor’a 9 Sd sec 4p west 
oe 

(9.1.2) Gefe  10" 4 OTN 
Bede, — da; \Bx; 


of 


The meaning of this is as follows: if the partial derivative Bn, exists in a neighbor- 


hood of a point « € R?, then we may attempt to take the partial derivative with 
respect to x; of the resulting function at the point «. The result, if it exists, is the 
right side of the above equation. The expression on the left is the notation that is 
commonly used for this second-order partial derivative. In the case where i = j, 
we modify this notation slightly and write 


a (of 
ae; (3s) 


9.1. Partial Derivatives 225 


A useful way to think of this process is as follows: the expression .— is an operator 
r. 


ee) 
that is, a transformation which takes a function f on an open set U to another 


a af 


function an, F) = 


on U (provided this derivative exists on U). In fact, this 
Gpetaten isa neat operators thats’ 

(2) (e) a) a) (2) 
(cf) = (f) and (f +9) (f) + 


=c 
x; On; 
Such operators may be composed ~ that is, we may first apply one such operator, 


Ox; Ox; Ox; 


OE Fes stage doa aa thion. a pplycanothen a 
Ox; Ox; 
continue to compose such operators, applying one after another, as long as the 
resulting function has the appropriate partial derivatives on the given open set. 
From this point of view, the second-order partial derivative of (9.1.2) is just the 
result of applying to f the second-order differential operator 


o a) e) 


Dx0x, Oxi dx) 


, to the result. In fact, we may 


We may, of course, define higher-order partial differential operators in an anal- 
ogous fashion. Given integers jy, jo... 


jm between 1 and p, we set 
om a 2a a 

3 r =; os Or os . 

Bey Oe;, Oxy, Oey, Oe, 


The resulting operator is a partial differential operator of total degree m. 


Example 9.1.5. Find DxdyOzdyOe if f(x,y, 2) =22y324 +a? ty! + xyz. 


Solution: We proceed one derivative at a time: 


a. 4 
apply ae = 2wy>24 + Qe + yz, 
apply =Gary*zt +z 
apply aoe Mey? 2? +1, 
a ot 
apply —: te oe A8eyz*, 
Oy ae 
a) af 3 
apply pe = Bye? 
aPP'Y Dadyddyde 


Equality of Mixed Partials. It is natural to ask whether or not, in a mixed 
higher-order partial derivative, the order in which the derivatives are taken makes 
a difference. Some additional calculation using the previous example (Exercise 
9.1.5) shows that, at least for the function f of that example, the order in which 
the five partial derivative operators are applied makes no difference. This is not 
always the case, but it is the case under rather mild continuity assumptions. When 
it is the case, we may change the order in which the partial derivatives are taken 
so as to collect partial derivatives with respect to the same variable together. For 


226 9. Differentiation in Several Variables 


example, the fifth-order mixed partial derivative of the previous example can be 
rewritten as 
OF oF 
OxdxcOyOyOz  Ix2Oy20z" 
The next theorem tells us when interchanging the order of a mixed partial 
derivative is legitimate. 


Theorem 9.1.6. Suppose f is a function defined on an open disc B,(a,b) C R?. 


o2 
Also suppose that both first-order partial derivatives exist in B,(a,b) and that ae 
yOu 

92 

exists in B,(a,b) and is continuous at (a,b). Then = as exists at (a,b) and is equal 
cy 

ar 
to Dyan). 


Proof. We introduce a function A(h,k), defined for (h, k) in the disc B = B,(0,0), 
by 
Mh, k) = f(a +h, b+ k) — fla +h,b) — f(a,b +k) + f(a,b). 

It follows from the hypotheses of the theorem that the partial derivative of 
Ah, k) with respect to h exists for all (h,k) in the disc B. If (h,k) € B, the 
rectangle with vertices (0,0), (0,k), (h,0), and (h,k) is also contained in this dise 
and so the partial derivative of \ with respect to its first variable exists on an open 
set containing this rectangle. 

Now for fixed k, 

A(h, k) = g(h) — g(0) where g(u) = f(a+u,b+k)— f(a+u,b). 
The function g is differentiable on an open interval containing (0, h], and so we may 
apply the Mean Value Theorem to g to conclude there is a number s € (0, h) such 
that 9(h) — 9(0) = hg'(s). This means 


a ” 
(9.1.3) Mh, k) = h (Z atsbon)—Zlatst)), 
; Me 
Of course, the number s depends on A and k. 
oe 
Sinee 


a. 
f exists on B, of is a differentiable function of its second variable 
ae r 


on B. Hence, we may apply the Mean Value Theorem to this function as well. We 
conclude that there is a point ¢ € (0,k) such that 
of of 

pg (at sb +k) — Fy (at 8.) =k 


Combining (9.1.3) and (9.1.4) yields 


(9.1.4) #F 


(a+s,b+t). 


Dye 


1 
Ah, k) = 
nth) 
By hypothesis, the second-order partial derivative on the right is continuous at 
(a,b). This implies that 


tharos) hk OyOx 


9.1. Partial Derivatives 227 


This conclusion uses the fact that the point (a+ s,b +4), 
closer to (a,b) than the point (a-+h,b +k). 

We complete the proof by noting that the above limit exists independently of 
how (h,k) approaches (0,0). In particular, the result will be the same if we first 
let k approach @ and then h. However, 


ohh 
lim, Him Th, k) 


hook 0h 
ee 1 (flath,b+k)—fla+h,b) — f(ab+k) = f(a,b) 
hehe k k 
— Lf, flathb+k)—f(ath,b) |. f(ab+k) —f(a,b) 
= lim = [ lim -— lim 
nooh \koe k koe k 


ef 
= B04 6o?)- 


: at ea . ’ a 
Hence, this second-order partial derivative also exists and it equals + f (a,b). Note 
ir 


that distributing the limit with respect to k across the difference in the second step 


above requires that we know that the two limits involved exist. This follows from 


the assumption that - exists in B,(a, b). Oo 
"y 


Obviously, the same result holds, with the same proof, if « and y are reversed 
in the statement of the above theorem. That is, if we assume that either one of the 
second-order mixed partials exists in a neighborhood of (a,b) and is continuous at 
(a,b), then the other one also exists at (a,b) and the two are equal at (a,b). 

The following example shows that the continuity of the mixed partial that is 
assumed to exist is a necessary assumption in the above theorem. 


Example 9.1.7. For the function 


wy —ay? ary 
‘AGS aha if (x,y) A (0,8), 


e if («,y) = (0,0), 


show that the first-order partial derivatives exist and are continuous everywhere. 
2 


a 
OOP ed 
andy ° Bydx 
everywhere but they are not equal at (0,0). Why doesn’t this contradict the above 


Then show that the mixed second-order partial derivatives exist 
theorem? 

Solution: Except at the point (0,0) where the denominator vanishes, we may 
use the standard rules of differentiation to show that 
Of _ (3x?y — y9) (a? + y?) — 2a(a%y — ey?) 
ac +P? , 
Of _ (@? — 3ay?)(0? + y?) — 2(@?y — xy?) 


(9.1.5) 


Oy ere 


228 9. Differentiatien in Several Variables 


These expressions may be differentiated again to show that each of the second-order 
partial derivatives also exists, except possibly at (@,@). 


a 
In order to calculate Lo, 0), we set y = 0 in the expression for f. The 


resulting function of 2 is identically 0 and, hence, has derivative @ with respect 
°F 6,0). Since both 
oy 
the expressions in (9.1.5) have limit @ as (x,y) + (@,0), the first-order partial 
derivatives are continuous everywhere, including at (0,0), where they both have 
the value @. 


to x. Similar reasoning leads to the same conclusion for 


92 ¢ 92 
To calculate —, we note that 2F(, for all x. Hence, 2+ 
ay Oxdy 


a oe 
On the other hand, PF o,y) =—y, and so eet 0) =—-1. 
Ox OyOx 


The two mixed partials are not equal at (0,@) even though they both exist 
everywhere. Why doesn’t this contradict the previous theorem? It must be the 


case that neither of these mixed partial derivatives is continuous at (@,0) — a fact 
that will be verified in the exercises. 


An important hypothesis in many theorems is that a function f belongs to the 
class C*(U) defined below. 


Definition 9.1.8. If U is an open subset of R?, then a function F : U + R@ is 
said to be @* on U if, for each coordinate function f; of F, all partial derivatives 
of f; of total order less than or equal to k exist and are continuous on U. 


Functions which are @! on U will be called smooth functions on U. 


By using Theorem 9.1.6 to interchange pairs of adjacent first-order partial dif- 
ferential operators, the following theorem may be proved: 


Theorem 9.1.9. [f a real-valued function f is C* on U CRP and m<k, then the 


mth-order partial derivative — is independent of the order in which the 


eq, ++ OR}, 
(2) 


first-order partial derivatives —— are applied. 


—— a a 
Exercise Set 9.1 


(2) 0. 
1. If f(ey) = Vx? +y%, find “e and a Are there any points in the plane 
Ox y 
where they don’t exist? 
2. If f(e,y) = ey? +ry +-y%, find all first- and second-order partial derivatives of 
fs 
of Of Of OF 
3. If f(2,y) = neosy, find 22, 22, , and ; 
f(x,y) = xcos y, find Bn’ Oy’ Ja0y and ayde 


oats of of OF Of 
4. If f(x,y) =e™siny, find ~, —, ,and ——*_, 
f(x,y) =e sin y, fin an’ By’ Ox0y an ayde 


9.2. The Differential 229 


10. 


. If f is the function of Example 9.1.5 directly calculate 
or 
Ox? Oy20z 
Verify that it is the same as the mixed partial derivative of f calculated in the 
example. 


. Let f : R + R be differentiable on R and define a function g : R? + R by 


09 oq 
g(x,y) = f(@+y). Use (9.1.1) to show that x a n R2, 


. Theorem 9.1.6 is a statement about a function of two variables. Show how it 


can be applied several times in a step-by-step procedure to prove that if U Cc R° 
and if f is @ on U, then 


if (x,y) 4 (0,8), 
if (w,y) = (0,0). 


0. 
For which values of p is - continuous at (0,0)? 
ae 


2 
. IE f is the function of Example 9.1.7, show by direct calculation that —-* 


is not 
Oxdy 


of 


—~ is not continuous 
OyOx 


continuous at (0,0). A similar calculation shows that 


at (@,@) (you need not do both calculations). 
If f is defined on R? by 


f(@,y) = 


Of of 


show that both 22 and © 
an 2° By 


(0,0). In fact, f itself is not continuous at (0, @) (see Example 8.1.3). 


exist everywhere but they are not continuous at 


9.2. The Differential 


Le! 


| f be a real-valued function defined on an interval on the line. Recall that 


the equation of the tangent line to the curve y = f(«) at a point a where f is 


dif 


fferentiable is 


y = f(a) + f'(a)(e— a). 


This is the equation of the line which best approximates the curve when x is near 


a. 


The right side is an affine function, 


T(x) = fla) + f"(a)(x a), 


230 9. Differentiatien in Several Variables 


of x. What is special about T that makes its graph the line which best approximates 
the curve y = f(x) near a? For convenience of notation let h = # — a, so that 
aw =ath, Then 
f(ath) —T(a+h) =f(ath) — f(a) — f'(a)h 

and so 

. f(ath)-—Tlat+h) . f(ath)— f(a) 

lim = lim 

hv h hoe h 


— f'(a) =0. 


In other words, not only do f and T have the same value at a, but as h approaches 
@, the difference between f(a+h) and T(a+h) approaches zero faster than h does. 
No affine function other than T has this property (Exercise 9.2.7). 


Example 9.2.1. What is the best affine approximation to f(«) = — 2x +1 at 
the point (2,5)? 

Solution: Here, a = 2, f(a) = 5, and f/(a) = f"(2) = 10, so the best affine 
approximation to f(r) at « = 2 is T(x) =5 + 10(a—2) = 10x — 15. 


Affine Approximation in Several Variables. By analogy with the single vari- 
able case, if F : D — R@ is a function defined on a subset D of R?, then the best 
affine approximation to F at a € D would be an affine function T : R? + R? such 
that F(a +h) —T(a+h) goes to @ faster than h as the vector h approaches 0. In 
order for this to make sense at all, a must be a limit point of D and, in fact, we 
will require that a be an interior point of D. This ensures that there is an open 
ball, centered at a, which is contained in D. 

It must also be the case that F and its affine approximation T have the same 
value at a. However, if T is affine, then T(a +h) = b+ L(a+h) where L: RP > R? 
is linear and b € R? is a constant. If T(a) = F(a), then 6 = T(a) — L(a) and so T 
has the form T(a + h) = F(a) — L(a) + L(a +h). Then, since L is linear, 


T(a+h) = F(a) + L(h). 


A function which has a best affine approximation at a is said to be differentiable 
at a. The precise definition of this concept is as follows: 


Definition 9.2.2. Let F : D — R? be a function with domain D C R?, and let a 
be an interior point of D. We say that F is differentiable at a if there is a linear 
function L : R? — R@ such that 


F(a+h) — F(a)— Lh) _ 


(9.2.1) lim =0. 


h90 (rll 
In this case, we call the linear function L the differential of F at a and denote 
it by dF(a). 
Just as in the single variable case, if F is differentiable, then the function 
T(a) = F(a) +dF(a)(x — a) 
is the best affine approximation to F(r) for a near a. 


Also, as in the single variable case, differentiability implies continuity. We state 
this in the following theorem, the proof of which is left to the exercises. 


9.2. The Differential 231 


Theorem 9.2.3. If F : D + R® is differentiable at a € D, then F is continuous 
ata. 


Example 9.2.4. Let F be the function from R? to R? defined by 
F(x,y) = (2? +y?, cy). 


Show that F is differentiable at (1,2) and its differential is the linear function with 


matrix 
24 
TCA: 


Find the affine function which best approximates F' near (1,2). 
Solution: With a = (1,2) and h = (e—1,y—2) = (s, 4), we have F(a) = (5,2) 
and 
F(a+h)— F(a) — Ah 

= ((1+s)? + (24+)? —5— 2s —4t, (1+ s)(2+t) —2-2s-t) 

= (s? +27, st). 
Thus, the error F(a +h) — F(a) — Ah if F(a +h) is approximated by F(a) + Ah is 

(s? + ¢?, st). 
Then, 
|| F(a + kh) — F(a) — Ah||? = (s? + 7)? + (st)? < 2I|n|ft. 

This implies 


F(a+h)— Fi 
[F(a < V2llhlh 


which has limit @ as h + @. This shows that F is differentiable at (1,2) and that 
dF(1,2) = A. 
The best affine approximation to F(«,y) near (1,2) is 


T(x,y) = 6,2) + G 5) is) 
(5 + 2(@ — 1) + A(y — 2),2 + 2(a — 1) + (y- 2)) 


(5 + 2x + dy,-2 + 2x + y). 


— Ah|| 


The Differential Matrix. Let F : D + R% be a function with D C R? and a an 
interior point of D. If F is differentiable at a, then it is easy to compute the matrix 
(ci) of its differential dF(a). This is called the differential matrix of F at a. As 
usual, we will tend to ignore the technical difference between the linear function 
dF(a) and its corresponding matrix (see Remark 8.4.13). 

We suppose that F(x) = (fi(x), fola),...; fg(@)), so that fy is the ith coordi- 
nate function of F. For j ..;p, we apply (9.2.1) in the special case in which h 
approaches @ along the line h = te; ~ that is, along the jth coordinate axis. Since 
the vector expression in (9.2.1) converges to @, the same thing is true of each of its 


coordinate functions. This means, 


lim fila + tej) — fila) — cijt 
t0 t 


232 9. Differentiatien in Several Variables 


which implies 
fila + tey) — fila) 

t 

The limit that appears in this equation is just the partial derivative 


of fi with respect to its jth variable at the point a. This is true for each i and each 
j. Thus, we have proved the following theorem. 


Theorem 9.2.5. If F : D + R4 is differentiable al an interior point a of D CRP, 
then its differential at a is the linear function dF(a) : RP + R4 with matrix 


on of of 
dry”) Bag es 3n, 
Ofs Ofe Ofe 
Bay (* Bay? dr, 
(9.2.2) = : : . 
fa fa). oh 
ie, ban”) a 30, 


If F is defined and differentiable at all points of an open set U C RP, then 
we say that F is differentiable on U. Its differential dF is then a function on U 
whose values are linear transformations from R? to Rf. Equivalently, its differential 
matrix dF is a q x p matrix whose entries are functions on U. 


Example 9.2.6. Assuming that the function F of Example 9.2.4 is differentiable 
everywhere, find its differential matrix. Verify that, at a = (1,2), it is the matrix 
A of the example. 

Solution: The coordinate functions for F are given by fi(x,y) =x? + y? and 
fo(a,y) = cy. The point a in this example is a = (1,2). The partial derivatives of 
fi and fo are 

of, 
Ox 
Ofe 


Thus, the differential matrix at a general point (,y) is 


Qe 2y 
yap 


At the particular point @ = (1,2), this is 


(@ %): 


This is, indeed, the matrix A of Example 9.2.4. 


9.2. The Differential 233 


A Condition for Differentiability. Since the vector function in (9.2.1) has limit 
@ if and only if each of its coordinate functions has limit @, we have the following 
theorem. 


Theorem 9.2.7. If D CR? and if F = (f\,-..; fy): D > R4 is a function, then 
F is differentiable at a € D if and only if, for each i, the coordinate function fi is 
differentiable at a. In this case, the differential matrix dF is the matrix whose ith 
row is the differential df; of the coordinate function fi. 


This result allows us to reduce the proof of the next theorem to the case ¢ = 1. 


Theorem 9.2.8. Let F = (fy,..-; fy) :U -+ R be a function defined on an open 
subset U of RP. If each first-order partial derivative of each coordinate function fi 
exists onU, then F is differentiable at each point of U where these partial derivatives 
are all continuous. Thus, if F is C! on all of U, then F is differentiable on all of 
U. 


Proof. By the previous theorem, it is enough to prove that each of the coordinate 
functions of F is differentiable at the point in question. Hence, it is enough to 
prove the theorem in the case q = 1. To complete the proof, we will prove the 
following statement by induction on p: if f is a real-valued function defined on an 
open set U C RP and each first-order partial derivative of f exists on U, then f is 
differentiable at each point of U where all of these partial derivatives are continuous. 

If p = 1, then the hypothesis implies, in particular, that f has a derivative 
at each point of U. For a function of one variable, this means the function is 
differentiable at each point of U. This completes the base case of the induction 
argument. 


We now assume our statement is true for functions of p variables and let f be a 
function of p +1 variables. We write points of R?*! in the form (cc, y) with « € R? 
and y € R. For some a = (a1,...,ap) € R? and b € R we suppose (a, b) is a point 
of U at which the first-order partial derivatives of f are all continuous. 

Ifh = (hi,-++ ,hp) € R? and k ER, then 


f(ath,b+k) — f(a,b) 
= f(ath,b) — f(a,b) + f(ath,b+k) — f(a+h,b). 


If we set g(x) = f(x,b) for x in an appropriate neighborhood of a in R? and use 
the Mean Value Theorem in the last variable on the last two terms above, then this 
becomes 
of 
(9.2.3) f(ath,b+k) — f(a,b) = g(a +h) — g(a) + 5, (0 +h,c)k, 
i] 
for some c between b and b +k. 


Since g is a function of p variables which satisfies the hypotheses of the theorem, 
g is differentiable at a by our induction assumption. Hence, dg(a) exists and 


tim Seth) = 9a) ~ dolar 
ast (IRA 


234 9. Differentiatien in Several Variables 


Because ||h|| < ||(, &)||, this implies 


(0.2.4) tim Seth) = 9a) = dola)h 


=0. 
(hk)90 \|(r, &) |] 


a 
Since as, is continuous at (a,b), |k] < ||(h,k)||, and (a + h,c) > (a,b) as 
y 
(h,k) > (0,0), we also have 


; 1 (of af 
9.2.5 1 =(at+h,c)— (a,b) )k=0. 
(92/5) aad) 0 [0s BI] (exc 2) — 5 (a8) 


Let v be the vector whose first p components are the components of dg(a) and 


) 
whose last component is y (a,b). Then, by (9.2.3), 
7] 


flath,b +k) — f(a,b) — 0+ (h,k) 


9.2.6 do fe 
Ge") = g(a +h) — g(a) — dg(a)h + (selet ma - yl) k 


On combining (9.2.4), (9.2.5), and (9.2.6), we conclude that 


i L(a+h,b +k) = fa,b) = v- (hk) 
lim 


=0 
(hk) (0,0) \|(A, kD I 


and, hence, that f is differentiable at (a,b) with differential v. This completes the 
induction and finishes the proof of the theorem. o 
Example 9.2.9. Show that the function F : R? > R® defined by 

F(x,y) = (xe¥, ye", ry) 
is differentiable everywhere, and then find its differential matrix. 


Solution: The first-order partial derivatives of the coordinate functions of F 
exist and are continuous everywhere. Hence, F is differentiable everywhere by the 
previous theorem. Its differential matrix is 


e& ze 
dF(x,y)= |[ye™ e® 
y rc 


A Function Which Is Not Differentiable. The existence of the first-order par- 
tial derivatives is not, by itself, enough to ensure that a function is differentiable. 
This is demonstrated by the next example. 


Example 9.2.10. Show that the function f defined by 
ry : 
soit (xy) # (0,0 
Hey =i Fre (x,y) # (0,0), 
0 if (x,y) = (0,0) 
is not differentiable at (0,0) even though its first-order partial derivatives exist 
everywhere. 
Solution: This is a rational function with a denominator which vanishes only 
at (0,0). Hence, its first-order partial derivatives exist everywhere except possibly 
at (0,0). However f is identically 0 on both coordinate axes (that is, f(«,0) = 


9.2. The Differential 235 


of AP. 
J(0,y)). Hence, both 5 and 5 exist at ( 


and equal @ However, f is 


clearly not differentiable at (@,@), since it is not even continuous at this point (see 
Example 8.1.3). 


10. 


11. 


————SSSa 
Exercise Set 9.2 


. If L: R? — R? isa linear function, show that dL = L. In other words, if L has 


matrix A, then A is the differential matrix of the lmear function L(x) = Az. 


. Find the best affine approximation near (@,@) to the function F : R? + R? 


defined by 


F(x,y) = (ay — 22 +y 41,2? +y? +a —-3y +6). 


. If F is the function of the previous exercise, find the best affine approximation 


to F near (1,—1). 


. Find the differential matrix for the function G ; Rt x R + R? defined by 


G(a,y) = (ylnz,xe¥,sinay). 


Then find the best affine approximation to G at the point (1,7). 


. Find the differential of the real-valued function f(x,y,z) = cy* cosaz. Then 


find the best affine approximation to f at the point (1,1,7/2). 


. Find the differential of the curve 7(t) = (sin(2nt), cos(27t), #2). Then find the 


best affine approximation to the curve 7 at the point £ = 1. 


. Prove that if f is a real-valued function defined on an open subinterval of R 


containing a and if S is an affine function such that f(a 


lim f(a+h) S(ath) _» 
h+0 h 


then S(a+h) = f(a) + f/(a)h. 


S(a) and 


. Prove that if U is a neighborhood of @ in R? and if F : U — R¢ is a function 


such that F(0) = @, then F is differentiable at @ with dF = @ if and only if 
lime +0 ||F'(x)||/||2| =@ 


. Prove Theorem 9.2.3. That is, prove that if a function is differentiable at a 


point in its domain, then it is continuous at that point. 


Does the function defined by 


if (ey) 4 (00), 
if (xy) = (00) 


have first-order partial derivatives at every point of R2? Is this function differ- 
entiable at (@,0)? Give reasons for your answers. 

If f : R? + R is differentiable at a € RP, then show that, for each h € R?, the 
function g : R -+ R defined by g(t) = f(a + th) has a derivative at t= @. Can 
you compute it in terms of df(a) and h? 


236 9. Differentiatien in Several Variables 


12. Prove that a function F : R? —+ R¥ is affine if and only if it is differentiable 
everywhere and its differential matrix is constant. 


9.3. The Chain Rule 


The differential of a function of several variables has properties similar to those of 
the derivative of a real-valued function of a single variable. The simplest of these 
are stated in the following theorem, whose proof is left to the exercises. 


Theorem 9.3.1. Suppose F and G are functions defined on an open set U C RP, 
with values in R4, and c is a scalar. If F and G are differentiable at a point x € U, 
then 


(a) cF is differentiable at x and d(cF)(«) = cdF(« 
(b) F+G is differentiable at x and d(F + G)(a) = dF(«) + dG(x). 


sand 


A result, which is more difficult, to prove but which is of great importance is 
the Chain Rule for functions of several variables. The proof becomes considerably 
simpler if we reformulate the concept of differentiability in the following way. 


An Equivalent Formulation of Differentiability. If f is a real-valued function 
defined on an open interval containing the point a € R, then we can always express 
f(a+h)— f(a) for h near but not equal to @ in the following way: 


(9.3.1) flath) — f(a) = a(h)h, 
where ¢(h) is just the difference quotient 


aa) 04 ny = fa) 
h 

Of course, f is differentiable at a if and only if q has a limit as h + @ The 
derivative is then defined to be this limit. The function ¢ becomes continuous at @ 
if it is given the value f’(a) at h = @ and then (9.3.1) holds at h = @ as well as at all 
nearby points. In fact, the differentiability of f at a is equivalent to the existence 
of a function which satisfies (9.3.1) and is continuous at h = @. This suggests the 
following reformulation of the definition of differentiability. 


Theorem 9.3.2. Let F be a function defined on an open set U C RP? with values 
in R4 and let a be a point of U. Then F is differentiable at a if and only if there is 
aqxp matriz-valued function Q(h), defined in a neighborhood of @, such that Q is 
continuous at @ and F(a +h) — F(a) is the vector-matrix product 


F(a+h) — F(a) = Q(h)h 
for all h in a neighborhood of 0. If this condition holds, then dF(a) = Q(0). 
Proof. Suppose a matrix Q with the required properties exists on some neighbor- 
hood V of @. Then, for h € V, 
F(a+h) — F(a) —Q()h — QAhJh—Q(O)h__ (Qh) — Q(0))h 


Ir l| » IlAll . (rll 


9.3. The Chain Rule 237 


This expression has norm less than or equal to ||Q(h) — Q(0)]||, which converges to 
@ as h — @, since Q is continuous at @ Thus, F is differentiable and its differential 
matrix is Q(0). 
Conversely, suppose F is differentiable at a. If we set 
eh) = F(a +h) — F(a) — dF(a)h, 
then € is a function on a neighborhood of @ with values in R® and 
_ e(h) 
lim = 
nove [P| 
If, when written out in terms of coordinate functions, ¢ = (€1,€,...;€) and h = 
(hy, ho,---;hp), then we define a q x p matrix A(h) by 


‘eqhy eqho ehy 
eohy egho eohy 
ACh) = |lh|/-? : 
Sghyseghigh Sirah 


This is a matrix-valued function of h, defined on a neighborhood of @, except at @ 
itself. Moreover, if we define this function to be @ when h = @, then it becomes 
continuous at h = 


, since 
le(r)hs] & |e) |INNFel] _ [ell 
RiP lel? (IPI 
and this has limit @ as h + @. Note also that if we apply the matrix A(h) to the 
vector h, the result is 


A(h)h = €(h). 
Thus, if we set 
Q(h) = dF(a) + A(h), 
then Q is continuous at h = @, Q(0) = dF(a), and 
F(a +h) — F(a) = dF(a)h + €(h) = dF(a)h + A(h)h = Q(A)h. 
This completes the proof. a 


The Chain Rule. After the above reformulation of differentiability, the Chain 
Rule has a simple proof. 


Theorem 9.3.3. Let U and V be open subsets of R” and R?, respectively, and let 
G:U — RP? and F: V — R® be functions with G(U) C V. Suppose a € U, G is 
differentiable at a, and F is differentiable at b = G(a). Then Fo G is differentiable 
ata and 


d(F 0 G)(a) = dF(G(a))dG(a). 


Proof. By the previous theorem, there are matrix-valued functions Qg and Qr, 
defined in neighborhoods of @ in R” and R?, respectively, each continuous at @, with 


Qr(0) = dF(b), Qq(0) = dG(a), and such that 
G(a+h)—G(a)=Qe(h)h and F(b+k) — F(b) = Qr(k)k 


238 9. Differentiatien in Several Variables 


for h and k in appropriate neighborhoods of @ Then, since G(a) = b, 
FoG(a+h)—FoG(a) = F(b+ Qa(h)h) — F(b) = Qr(Qalh)h)Qa(h)h. 
Since Qg and Qp are both continuous at @, we have 
Him Qr(Qa(h)h)Qa(h) = Qr O)Qa(0) = AF (b)dG(a) = AF(C(a))dG(a). 
Thus, if we choose Qroa(h) to be Qr(Qa(h)h)Qa(h), it satisfies the conditions 
of the previous theorem with F replaced by Fo G and, hence, by that theorem, 
d(F 0 G)(a) exists and equals dF(G(a))dG(a). Qo 
Example 9.3.4. Let f(«,y) be a real-valued function of two variables and let 
(r, s,t) = f(r(s +t),r(s — t)). 
of 
od 


Find dé(1,2,1) if 5(3,1) =4 and set N= 


Solution: The function ¢ is just f oG, where G : R® + R? is defined by 
G(r, s,t) = (r(s +4), r(s —8)). 
We have G(1, 2, 1) = (3, 1) and 


(2.0) = (f : ae 


Thus, dé(1, 2, 1) = dF(G(1, 2, 1))dG(1, 2, 1) is 


(Hon Zen) @} 4) 
=(4, —5) G ; #1) = G19). 


Example 9.3.5. If F(«,y) = (fil«-.y), fole,y)) is a differentiable function from 
R? to R® and we define G : R? + R? by G(s,t) = F(s? + t?,s? — @), find an 
expression for the differential matrix of G in terms of the partial derivatives of f; 
and fo. 

Solution: The function G is F oH where H(s,t) = (s? +t?,s? — 12). The 
differential matrices of F and H are 


_ | Ox dy 4 _ (2s 2 
aF={ 3p on | ana an = (3% EAP 
Ox Oy 


By the Chain Rule, 
dG(s,t) = d( Fo H)(s,t) = dF(H(s,t))dH(s, t) 
Of , oft Of. Of 
2s a + Oy 2t 2 - | 
~ Of, Ofe Of2  Ofe\ |? 
(e+e) HE) 


where the partial derivatives of f; and i. are to be evaluated at the point H(s,t) = 
(s? + 42,8? - t?). 


9.3. The Chain Rule 239 


Differential of an Inner Product. The following theorem is a nice application 
of the Chain Rule. 


Theorem 9.3.6. Suppose F and G are functions defined in a neighborhood of a 
point a € RP and with values in R4. If F and G are both differentiable al a, then 
F-G is also differentiable al a and 


d(F - G)(a) = G(a)dF(a) + F(a)dG(a), 


where each of the products on the right is the matrix product of a 1X @ matrix times 
a@qX p matrix. 


Proof. Let H : R2? +R be defined by 
H(u,v) =u-v, 


where, if u = (w1,...,tg) and v =(v1,...,vg) are vectors in R?, then (u, v) denotes 
the vector (u,..., Ug, V1,---,Uq) in R*4. 

Now F-G = H 0(F,G), where (F,G) denotes the function with values in R?? 
whose first ¢ coordinate functions are the coordinate functions of F and whose last 
¢ coordinate functions are the coordinate functions of G. 

The function H is differentiable everywhere because its coordinate functions 
ujv; have continuous partial derivatives everywhere. That is, 


Oujv; Oujv; 
F = Vi, 
Be 


and all other first-order partial derivatives are zero. This means that its differential 
is the 1 x 2q matrix 


(ube 84 ma tage rigUa)> 


Since F and G are differentiable at a, the coordinate functions of both are 
all differentiable at a. This implies that the function (F,G) is differentiable at a, 
since each of its coordinate functions is a coordinate function of F or a coordinate 
function of G. Furthermore, 


arene) = (‘ela 


where the matrix on the right has its first ¢ rows the rows of dF(a) and its last 
rows the rows of dG(a). 


By the Chain Rule, 
d(F - G)(a) = dH (F(a), G(a))d(F, G)(a) 
= (640). Po) (eta) 
= G(a)dF(a) + F(a)dG(a). Oo 


240 9. Differentiatien in Several Variables 


Dependent Variable Notation. A notation that is often used in connection 
with differentiation and specifically the Chain Rule is one which emphasizes the 
variables in a problem, some of which depend on others through functional relations, 
but which de-emphasizes the functions defining these relations. In this notation, 
a function F of p variables with values in R? determines a vector of g dependent 
variables 


u= (ur, U2,..., Ug) 


which depend on a vector of p variables 
w= (a1, %2,..., 2p) 


through the relation u = F(a). The differential matrix is then the matrix 


Ou, Juz Ouy 
Ox, Org Oxy 
Our Dug Dug 
Ox; Org Oxy 
Bug  Oug dug 
He, By Ixy 
Oui. 2 we NASON : s 
where —— is understood to be the partial derivative —— of the ith coordinate 


‘3 
J if 
function of F evaluated at a generic point a of the domain of F. 


Now the variables x; themselves may depend on a vector of variables 
t= (t15 ta, <3) 


through a function G. The differential matrix for this relationship would be the 


matrix 
( :) 
Oth jk 


Since the variables u; depend on the variables xj, which in turn depend on the 
variables t,, the variables u; also depend on the variables t, (through the function 
F'oG), and the differential matrix for this relationship is denoted 


Dug 
Oth in 


Using this notation, the Chain Rule becomes 


Ou, Ou. Ox; 
9.3.2 BLED Vier a am ee) 
( } Ge), an e ) 


where the expression on the right is the product of the indicated matrices. This 
product will involve the variables x; as well as the variables ¢, and it is important 
to remember that the «;’s are themselves functions of the variables t,. 


9.3. The Chain Rule 241 


A Change of Variables. 


Example 9.3.7. If u = f(x,y) expresses the variable u as a function of Cartesian 
coordinates (x, y) on an open subset of the plane, what is the relationship between 
the differential matrix of u as a function of («,y) and its differential matrix as 


a function of the corresponding polar coordinates (r,@), where x = rcos@ and 
y =rsin0? 
Solution: The change of coordinate transformation (x,y) = (rcos 6, sin) 


has differential matrix 


cos@ —rsin@ 
sind rcosé }* 


Thus, 
du\_ (du du) (cos —rsind 
30) ~ Be dy) (sine reose )* 
or 
ot = cose + singe, 


Ou 


a 
= —rsind—" + rcos0 
de 


——— 
Exercise Set 9.3 


1. If F is a function from an open subset U of R? to R? which is differentiable 
at a and if B is an r x q matrix, then show that d(BF)(a) = BdF(a). Here, 
BF(z) is the matrix B applied to the vector F(x) and BdF(a) is the product 
of the matrix B and the matrix dF(a). 

2. If f(a,y) is a differentiable function of («,y) € R? and g(t) = f(tx,ty) for all 
“€R, find g'(1) in terms of the partial derivatives of f. 

3. An n-homogeneous function on R2 is a function that satisfies f(te,ly) = 
i" f(x,y) for all t € R and (x,y) € R?. Show that a differentiable function 
on R? is n-homogeneous if and only if it satisfies the differential equation 

of | Of 


ae Yay 


at each (x,y) € R?. 
4. If f is a differentiable function on R and g(x,y) = f(y), show that 
9 
ei 


. If f and g are twice differentiable functions on R and 


a 


h(a,y) = f(e@-y) + 9(e+y), 
show that h satisfies the wave equation: 
Oh: Oh 
Se 


242 9. Differentiatien in Several Variables 


6. If u is a variable which is a differentiable function of (x,y) in an open set 
U CR®, if « and y are differentiable functions of (s,t) € V for an open set 
V CR?, and if (x,y) EU whenever (s,t) € V, then use the Chain Rule to 
obtain expressions for — Ou indo on’ Vein terme ef the partial derivatives of u 


with respect to # and y and the partial derivatives of « and y with respect to 
sand t. 


7. Do the preceding exercise in the special case where 
w=astbt and y=cs+dt 


for some constants a,b, ¢,d. 

8. If F(«,y) = (fila,y), fol, y)) is a differentiable function from R? to R? and 
if we define G: R? + R? by G(s,t) = F(st,s +4), find an expression for the 
differential matrix of G in terms of the partial derivatives of f, and fo. 

9. If (w,y,2) are the Cartesian coordinates of a point in R3 and the spherical 
coordinates of the same point are 7,0, ¢, then 


x=rcosOsing, y=rsindsing, z=rcosd. 


Let u be a variable which is a differentiable function of (2, y, z) on R3. Find 
a formula for the partial derivatives of u with respect to 7, 0,@ in terms of its 
partial derivatives with respect to a, y, 2 

1. Suppose U and V are open subsets of R? and F ; U + V has an inverse function 
G:V + U. This means Fo G(y) = y for all y € V and Go F(z) = for all 
x €U. Show that if F is differentiable on U and G is differentiable on V, then 
dF(c) is non-singular at each x € U, and for each « € U, 


dF(x)!=dG(y) where y= F(z). 


11. Show that if F is a differentiable function on an open set U C R? with values 
in R4, then the real-valued function ||F(2)||2 on U has zero differential at « if 
and only if the vector F(z) is orthogonal to each of the columns of dF(c). 

12. Prove Theorem 9.3.1. 


13. If f(a,y) = 2? + y?, find a 1 x 2 matrix-valued function Q which satisfies the 
conclusion of Theorem 9.3.2 for f. 


14. In the proof of Theorem 9.3.3, the following fact is used twice: if A(h) is a 
4X p matrix whose entries are functions of h € R? and if A(h) is continuous at 
h =@, then limp, ,9 A(h)h = @, where A(h)h is the result of the matrix A(h) 
acting via vector-matrix product on the vector h. Prove that this limit is @, as 
claimed. 


9.4. Applications of the Chain Rule 


The Gradient. The case q = 1 is of special interest in our discussion of the 
differential. In this case, we are dealing with a real-valued function f on a domain 


9.4. Applicatiens ef the Chain Rule 243 


D CRP. At any point « where the function f is differentiable, its differential matrix 
is a 1 x p matrix — that is, a row vector 


_ ( of of 
ae (fo. 54). 


The resulting vector is called the gradient of f at x. It is sometimes denoted Vf 
and sometimes denoted grad f. 


If f(x1,...,2p) is the function f with its argument written out in terms of 
coordinates, then a notation often used for df is 
of of of 
9.41 if = dey + dey +... + dey. 
(9-44) Ea 8 aa eh ope 


The interpretation of this is as follows: it is understood that df and the partial 
derivatives in this equation are evaluated at some generic point « of the domain of f. 
For each j, dar; is the differential of the jth coordinate function 2; on R?. As such, it 
is the linear transformation from R? to R which sends a vector (v1,...,0) € RP to 
its jth component v;. As a row vector, it is the vector which has 1 as jth component 
and @ for all other components. Earlier we called this vector ej, but in the context 
of differentials it is common to call it drj. Equation (9.4.1) expresses the fact that, 
for each function f as above, df at a given point is a linear combination of the basis 
elements de; with the coefficients being the corresponding partial derivatives of f 
at that point. 


Example 9.4.1. If f(x,y, z) = z*+sin ry, find the gradient of f at a generic point 
(x,y,z) and at. the particular point (1,@,3). 
Solution: At (x,y,z) the gradient of f is 
df = (ycosxy, x cos xy, 2z). 
At (,y,z) = (1,@,3) this is the vector (@,1,6). In terms of the basis vectors 
dx, dy, dz, we have 
df =ycosxy dx + x cos ay dy + 2zdz, 
which, at («,y,z) = (1,0@,3), is dy + 6dz. 

Directional Derivatives. We specify a direction in R? by specifying a unit 
vector (vector of length 1) that points in this direction. For example, in R? we may 
specify a direction by specifying an angle @ relative to the positive x-axis, but this is 
equivalent to specifying the unit vector (cos@,sin@) which points in this direction. 


Given a function f, defined on a neighborhood of a point a € R?, each first- 
order partial derivative of f at a is defined by restricting f to a line through a 
parallel to one of the coordinate axes and differentiating the resulting function of 
one variable. However, there is nothing special about the coordinate axes. We 
may restrict f to a line in any direction through a and differentiate the resulting 
function of one variable. This leads to the concept of directional derivative. 


Definition 9.4.2. Suppose f is a function defined in a neighborhood of a € R? and 
and wu is a unit vector in R?. The directional derivative of f at a, in the direction 
u, is defined to be 


f(at tu)|e-0. 


244 9. Differentiatien in Several Variables 


If f happens to be differentiable at a, then its directional derivatives all exist 
and are easily calculated. 


Theorem 9.4.3. Suppose f is a function defined in a neighborhood of a € RP 
and differentiable at a. If u is a unit vector in R?, then the directional derivative 
Duf(a) exists and 

Du f(a) = df(a)u. 
Proof. If g : R — R? is defined by g(t) = a + tu, then dg(t) = g’(t) = u and 
D, f(a) = d(fog)(0). The Chain Rule implies that this exists and is equal to 
af (a)dg(0) = df(a)u. a 


The directional derivative D, f(a) represents the rate of change of f as we pass 
through a in the direction specified by u. If this is positive, then it represents the 
rate of increase of f in the u direction as we pass through a. 


The proof of the following theorem is left to the exercises. 
Theorem 9.4.4. Suppose f is a real-valued function which is defined and differen- 
tiable in a neighborhood of a € R?, and suppose that df(a) 4 @. Then the gradient 


df(a) points in the direction of greatest increase for f at a ~ that is, Dy f(a) has 
its maximum value when the unit vector u is a positive scalar multiple of df (a). 


Example 9.4.5. If f(x,y) = 2—«? — y’, find the direction of greatest increase of 
f at (1,1) and the rate of increase of f in this direction at (1,1). 
Solution: The gradient of f is 
af (x,y) = (—2, —2y). 
At (1,1) this is 
af(1,1) = (—2,—2). 
A unit vector which points in the same direction is u = (—1/¥2,-1/V2). The 


directional derivative in the direction of u is 


Duf(1,1) = df(1,1)-u= V2 4 V2 = 2v2. 


The Derivative of a Curve. Another special case of importance in the study 
of differentials is the case of a curve in R@ ~ that is, a function 


y(t) = (n1(t), V2(t), ++ Yat), 


defined on an interval J CR, with values in R¥. In this case, the differential matrix 
dy, at an interior point of J, is a ¢ x 1 matrix — that is, a column vector. This is 
the column vector obtained by transposing the vector 


V(t) = O1(8), BM). 74(4)) 
of derivatives of the coordinate functions of 7. 
Ifa € J, the best affine approximation to y(t) for { near a is the function 
7(t) = (a) +7/(a)(t— a). 


Assuming 7'(a) 4 @, this is a parametric equation for a line through b = 4(a) which 
is parallel to the vector 7/(a). If one more restriction on the curve 7 is met, this 
line will be called the tangent line to the curve at 7(a). 


9.4. Applicatiens ef the Chain Rule 245 


The additional restriction needed on + is that a is the only point on the interval 
Tat which y has the value b. Otherwise, the curve crosses itself at b and the tangent 
line to the curve at b is not well defined ~ there is a different tangent line for each 
branch of the curve passing through b (see Figure 9.4.1). In this case, we will say 
that b is a crossing point for -y. Crossing points can be eliminated by replacing the 
interval J with a smaller open interval, containing a, but no other points at which 
*y has the value (a). In our continuing discussion of curves and their tangent lines, 
we will assume that 7(a) is not a crossing point of 7. This assumption and the 
assumption that 7/(a) 4 0 ensure that y has a well-defined tangent line at (a). 

Note that each point 7(¢) which is on the tangent line and sufficiently close to 
(a) determines a parameter value ¢ € I and this, in turn, determines a point (¢) 
on the curve. The two points 7(t) and r(t) differ from one another by 


Y(t) — ya) — ¥(a)(t = a) 
and the norm of this vector approaches 0 faster than ¢ — a approaches 0 as t > a. 
This justifies the claim that the curve 7 and the line 7 are tangent at the point 
(a). Note, however, that this line of reasoning is only valid if y'(a) # 0, since, 


otherwise, 7 is constant and fails to determine a non-degenerate line. 
If (a) £ 0, the vector 


(a) = Bana) 
Ily'(a) II 
is a unit vector (a vector of length one) which is parallel to the tangent line at a. 
It is called the tangent vector to the curve at 7(a). 


The vector 7'(a) is sometimes called the velocity vector of the curve at 7(a), 
since it does represent velocity in the case where the curve is describing the motion 
of a body through space. 


Example 9.4.6. The parameterized curve 7(¢) = (cost, sin 2t), 0 < t < 27, passes 
through the origin. At the origin, find its velocity vector, tangent vector, and 
tangent line. Do the same exercise if the domain of 7 is restricted to (0,7). 


Solution: The origin is a crossing point for this curve (see Figure 9.4.1). The 
curve passes through the origin when ¢ = 1/2 and when = 32/2. Thus, there is 
no well-defined velocity vector, tangent vector, or tangent line. If we restrict the 
domain of + to the interval (0,7), then the effect is to choose one branch of the 
curve and the crossing is eliminated. Then the curve passes through (0,0) only at 
1/2. We have 


“{(t) = (=sint,2cos2t) and »!(m/2) = (—1,—2). 
Hence, the velocity vector at (0,0) is 7/(7/2) = (-1, 2), the tangent vector at this 
V(n/2) (5 =2 
ll) AV V5)? 


to this curve at (0,0) is 


1(t) = (0,0) + (¢ — 7/2)(—1, -2) = (7/2 — t, w — 24). 


point is and a parametric equation for the tangent line 


If we define the domain of y to be (m,2m), then we are choosing the other 
branch of the curve — the one which passes through (0,0) at ¢ = 31/2. We leave 
the problem of finding the tangent line to the curve at this point to the exercises. 


246 9. Differentiatien in Several Variables 


Figure 9.4.1. Curve with a Crossing Point. 


Higher-dimensional Tangent Spaces. The following discussion is a higher- 
dimensional version of the above discussion of curves and tangent lines. Suppose 
p <q, U CR? is open, and F : U + R? is a smooth function. Since dF is a q x p 
matrix at each point of U and p < q, the maximal possible rank of dF is p. Suppose 
a € U isa point at which dF has rank p. Then the function 


(9.4.2) (x) = F(a) + dF(a)(x— a) 


is an affine function of rank p (Definition 8.5.8). This implies that its image is a 
p-dimensional affine subspace of R? (a translate of a p-dimensional linear subspace). 
Each point in this subspace which is sufficiently near F(a) is (x) for some « € U 
and, for such a point, there is a corresponding point F(r) in the image of F. Now 
® is the best affine approximation to F near a and so the norm of 


F(«) — ®(«) = F(«) — F(a) — dF(a)(«— a) 


approaches 0 faster than ||« —a|| approaches @ as « + a. This justifies calling the 
image of & the langent space to the image of F at F(a). At least this is the case 
if a is the only point in U at which F has the value F(a) (so that F(a) is not a 
crossing point of F). The situation described in this discussion is important enough 
to warrant a definition. 


A function F, defined on U, is one-to-one if there are no two distinct points of 
U at which F has the same value. 


Definition 9.4.7. With p < q, let U be an open subset of R? and let F : U + R@ 
be a one-to-one smooth function on U such that dF(a) has rank p at each point 
a€U. Then we will call the image S of F a smoothly parameterized p-surface in 
R? and we will say that F is a smooth parameterization of S. 


We define the tangent space of $ at each b = F(a) € S to be the affine subspace 
of R9 which is the image of the function ® of (9.4.2). 


In the case where p = q—1, a p-surface in R47 is called a hypersurface in R@ and 
its tangent space at b = F(a) is its tangent hyperplane at b. If q = 3 and p = 2, 


9.4. Applicatiens ef the Chain Rule 247 


Figure 9.4.2. A Cone in R°. 


then a 2-surface in R? is just a surface and its tangent space at b is its tangent 
plane at b. 


Example 9.4.8. With e = r9cos0, b = rosin, and ro > @, find the tangent 
plane at (@,b,79) to the cone in R® parameterized by the function G defined by 
G(r,0) = (rcos0,r sind, r). 
Is there a point on the cone where the tangent, plane is not defined? 
Solution: The differential dG at (179,40) is 


cos Oy —ro sin Oo ‘elroy —b 
sin®y rocosd) | =(b/ro « 
1 e 1 6 


If ro 4 @, this matrix has rank 2. It defines a parameterized plane by 


« ‘a/ro -b\ p 
a(r,0)=[b]4+[b/ro « (25) or 
r9 aes | 7 


4(r, 0) = (ar/ro — (8 — 0), br/ro + @(8 — 00), 7). 


There is no tangent plane to the curve at the origin. The differential of G at this 
point has rank 1 rather than rank 2 and the origi 
that G does not satisfy the conditions of Definition 9.4.7. In fact, it is apparent 
from Figure 9.4.2 that there is no parameterization of the cone in a neighborhood 
of the origin that will make it a smooth p-surface and no reasonable candidate for 
a tangent plane. 


is a crossing point, which means 


Level Sets. If F : U > R? is a function defined on an open subset U of R4, 
then a level set for F is a set of the form 


S={yeEU: Fly) =c} 


248 9. Differentiatien in Several Variables 


where c is a constant vector in R¢, By subtracting c from F, we can always arrange 
things so that S is the subset of U defined by the equation F(y) = 0. 


Under these circumstances, it is often the case that locally (meaning near a 
given point 6 € 8) S can be represented as a smoothly parameterized surface of 
some dimension and its tangent space can be realized as the set of solutions y to 
the equation 


dF (b)(y —b) = 0. 


We will learn more about when this is true in the last section of this chapter. For 
now, we settle for a couple of preliminary results. 


Theorem 9.4.9. With F as above, let V be an open subset of R? and let G:V + 
R? be a smooth function such that G(V) is contained in a level set of F. Then 


dF(y)dG(x) =@, where y=G(x), 


for each x EV. 


Proof. If the image of G lies in a level set of F, then there is a constant c € R? 
such that 


(FoG)(x)=c forall «eV. 
Then, by the Chain Rule, 


@=d(FoG\( 


= dF(G(x))dG(z). a 


Example 9.4.10. Show that a curve y in R? of constant norm, ||7(é)||, has its 
tangent vector orthogonal to its position vector at each point. 

Solution: If ||7(z)]| is constant, then so is ||7(1)||?. This means that + has its 
image in a level set of the function f(x) = ||x||? = x-«. By the previous theorem, 
df(«)dy(t) = 0 if « = y(t) is a point on the curve. This means that the velocity 
vector 7/(t) is orthogonal to the gradient 2c of the function f at each point « = >(t) 
of the curve (see Exercise 9.4.6). Hence, 7/() is orthogonal to 7(t) at each ¢. Since 
the tangent vector T(t) = y'(¢)/||7(#)|| is a scalar times y‘(£), it is also orthogonal 
to the position vector 7(t) for each t. 


How smooth is a level set for a smooth function F : U + R@? Does it have a 
tangent, space at some or all of its points? If so, does it resemble a curved version 
of its tangent space? 

By Definition 9.4.7, in order for a level set S for F to have a tangent space 
at a point b € S, there must be a neighborhood of b in which S$ is a smoothly 
parameterized p-surface. That is, near b, S must be the image of a smooth function 
G: V +R, with V an open subset of R? and the rank of dG equal to p (the 
maximal rank possible) at each a € V. Then the image of the affine function 
(x) = b+dG(a)(x—a) is a p-dimensional affine subspace of R4 (the tangent space 
to S at b= G(a)). Also, by the previous theorem 


0 = dF(b)dG(a)(« — a) = dF(b)((«) — b). 


9.4. Applicatiens ef the Chain Rule 249 


This means that the image of ® — b is a linear subspace of A’ = kerdF(b). Hence, 
K has dimension at least p and it has dimension exactly p if and only if the image 
of & —b is equal to A’. The dimension of A’ is p if and only if the rank of dF(b) is 
4@—p. Hence, we have proved 


Theorem 9.4.11. With F as above and S a level set of F containing the point b, 
if in some neighborhood of b the space S is a smoothly parameterized p-surface and 
if dF(b) has rank q—p, then the tangent space to S at b is the set of solutions y to 
the equation dF(b)(y — b) = 0. If the rank of dF(b) is less than ¢ — p, then the set 
of solutions to this equation contains the tangent space to S at b as a proper subset. 


Example 9.4.12. If f(2,y,z) =«* +y — 2? and S is the level set for f defined 
by S = {(a,y, 2): f(x,y, 2) = 0}, show that at every point (a,b,c) on S, except at 
the origin, S is a smoothly parameterized 2-surface with tangent space defined in 
terms of the kernel of df as in the previous theorem. Give the resulting equation 
for the tangent space. Then show that all of this fails at the origin. 


Solution: The surface $ is the same as the parameterized surface of Example 
9.4.8 and Figure 9.4.2. By that example, S is a smoothly parameterized 2-surface 
near each such point except the origin. At (a,b,c) # (0,0,0), df is (2a, 2b, 2c). This 
has rank 1 = 3—2. Therefore, by the previous theorem, 5 has a tangent space 
given by 

2a(a — a) + 20(y — b) + 2e(z —c) =0. 


At 0, df is the 0 matrix. Hence, the kernel of df(0) is all of R°. Since S 
is the cone of Example 9.4.8, it is a two-dimensional surface and it does not seem 
reasonable for it to have a three-dimensional tangent space at a point. The problem 
is that S is not a smoothly parameterized surface in a neighborhood of the origin 
and, hence, does not have a tangent space there in the sense we are using the term 
in this text. 


When can a level set of a function F : U + R@ be represented as a smoothly 
parameterized p-surface where ¢—p is the rank of dF(b)? That is the subject of the 
implicit function theorem discussed in the last section of this chapter. At this point, 
it is not clear that a level set of a smooth function F has a smooth parameterization 
near any of its points. 


For some level sets the construction of a smooth parameterization of the right 
dimension is easy. This is true of a level set which arises as the graph of a function, 
as the next example shows. 


Example 9.4.13. Show that if g is a smooth real-valued function defined on R?, 
then each level set of the function f : R° —+ R defined by f(x,y, 2) = = —9(2,y) 
may be represented as a parameterized 2-surface. 

Solution: Choose G(x, y) = (x,y, g(x,y) +e). This is a smooth function from 
R? to R? with differential of rank 2 at each point and image equal to the level set 


S = {(a,y,2) + f(e,y, 2) = ce}. 


9. Differentiatien in Several Variables 


Noa 


at. 


12. 


13. 


14. 


. For the function f| 


. Show that the gradient at « € RP of the function g(x) = «+a is the vector 2 


SS 
Exercise Set 9.4 


_ If f(a,y,2) =asinz + ycosz at each (x,y,z) € R%, then find the gradient df 


of f at any point (x,y,z). What is df(1, 2, 7/4)? 

y) =x? +y>+2y, find the gradient at the point (1, 1), the 
direction of greatest ascent of f at this point, and a direction in which the rate 
of increase of this function is @ (the answers to the last two questions should 
be unit vectors). 


. Find a parametric equation for the tangent line to the curve 


HO = C1?) 


at the point where £ = 1. 


. For the curve y of Example 9.4.6, find a parametric equation of the tangent 


line to this curve at (0,@) if the domain of >(t) is {¢: 7 << 2n}. 


. Prove Theorem 9.4.4. 


. Let 7: R > RP? be a curve which passes through the origin in R? at a point 


where its velocity vector is non-zero (that is, assume (to) = @ and +/(to) # @ 
at some point fo € R). Prove that there is an interval J centered at Uo such that 
|/y(6)|| is decreasing for t < to and increasing for t > to. Hint: ||y|| is increasing 
(decreasing) wherever ||7||? = 7-7 is increasing (decreasing). 


. Find the tangent space at (2,4, 1) for the parameterized surface in R® param- 


eterized by the function G : U > R°, where 


U ={(u,v)€R?:u>@v>0} and G(u,v) = (uv, u?,v?). 


. If a surface in R? is defined by the equation z = g(x,y), where g is a differ- 


entiable function of (x,y) in an open set U, find the equation for the tangent 
plane to this surface at a point (a,b,c) on the surface. 


. Find an equation for the tangent plane to the surface z = x” siny + 2x at the 


point (1,0,2). Also find parametric equations for a line which passes through 
this point and is perpendicular to the tangent plane. 

Find the equation for the tangent plane to the cone z = 2? + y? at the point 
(1, 2,5). 

Show that for each point (a,b,c) on the surface x? + y? + z* = 1, there is a 
neighborhood of (a,b,c) in which the surface may be represented as a smoothly 
parameterized 2-surface. Hence, there is a tangent plane to this surface at every 
point. 

Find an equation for the tangent plane to the surface of the previous problem 
at each point (a,b,¢) on the surface. 


Find an equation for the tangent plane to the surface x? + y? — z? = 1 at each 


point (a,b,c) on the surface. 


5. Tayler’s Fermula 251 


9.5. Taylor’s Formula 
In this section we discuss Taylor’s formula in several variables and some of its 
applications. 
The Formula. If a and x are points of R?, then a parameterized line passing 
through a and « is given by 
n(t) =a +t(a—a). 


Note that 7(0) =a and 7(1) =. The line segment joining a to « is the closed 
interval [a, 2] on this line defined by 


{a, 


={a+t(x—a):t € [0, 1]}. 


Let f be a real-valued function defined on an open subset U C R? and suppose 
that all partial derivatives of f through degree n exist on U and are themselves 
differentiable on U. If a,x € U and the line segment joining a to « is contained in 
U, then we set h = a—a and define a function g on an open interval I containing 
(0, 1] by 

g(t) = flat th). 


The function g is n + 1 times differentiable on I (by the Chain Rule) and so g 
satisfies Taylor’s formula (Theorem 6.5.3): 


(95.1) g(t) = 9(0) + 9/(O)E+ + FO 4.4 “Oye 4 R(t, 
where 

” 
(9.5.2) R,(t) = a2) eee 


- +0! 
for some s between 0 and t. 


Since (1) = f(a +h), to get a formula for f(a +h), we need only set = 1 in 
the above formula and then find expressions for the functions g*(0) and g™))(c) in 
terms of f and its derivatives. This is not difficult for the first few terms: 


9(0) = f(a), 
Laz) 
(0) = (ah => (ayn; 
(9.5.3) an 
u = 2 
gf (0) =h-df(a)h = Lys” a)hihj. 


Here we have used d? f(a) to stand for the matrix 


of 
(gee @) 


If we apply this matrix to h, the result is a vector of length p and we may take the 
inner product of h with this vector. The result is the formula for g”(0) in (9.5.3). 


252 9. Differentiatien in Several Variables 


The kth derivative of g at 0 is 


*) P P ot f 
9.5.4 0) = So... SS ——*~ (ai, «fi. 
(95.4) 00) = oo 5 Ge haa 

i=1. ip=l 
We may think of this as a k-dimensional array (a tensor of rank k) 

of 
ak = (——*___(a) ), 
f(a) (se —) 

applied k times to the vector h. Here, applying a tensor of rank k to a vector h 
yields a tensor of rank k—1 in the same way that applying a matrix (tensor of rank 
2) to a vector produces a vector (a tensor of rank 1). Thus, applying the tensor 
d* f(a) to the vector h produces the tensor of rank k — 1: 


ak 
dt f(a)h = (= * vaiNe (oma). 


ip=l 

This has rank k— 1 because we have summed over the index iz, and so the result is 
no longer a function of this index. If we repeat this k times, we obtain the number 
(tensor of rank @) expressed in (9.5.4). This is the result of applying d* f(a) a total 
of k times to the vector h and, hence, we will denote it by d* f(a)h*. Note, in 
particular, that d? f(a)h? is just h- d?f(a)h. 


If we use this notation for the derivatives of g in (9.5.1) and (9.5.2), the result 
1 1 
(9.5.5) f(a +h) = fla) +df(a)h+ se? sah? +--+ Ga" flayh" + Rn, 


1 
@+D! 
for some point ¢ on the line segment joining a to a-+h. This is Taylor’s formula in 
several variables. Expressed in terms of the variable « = a+h (so that h = «—a), 
this becomes the formula of the following theorem. 


(9.5.6) Ra = att f(cyhn tt, 


Theorem 9.5.1. Let f be a real-valued function defined on an open set U CR? 
and suppose all partial derivatives of f through degree n exist and are differentiable 
on U. Ifa,x€U and U contains the line segment [a,], then 


He) = Fla) + af (a)(0 ~ a) + 5a f(a)(e— a)? +--+ Ga" f(a)(e— a)" + Roy 
where 1 
Ry = ae yt role =a)", 


for some point c on the line segment [a, 2]. 


Example 9.5.2. Find the degree n = 2 Taylor formula for f(«,y) = In(« + y) at 
the point a = (@,1). 

Solution: We will need expressions for all partial derivatives of f through 
degree 3. However, these are easy to calculate because each nth-order partial 
derivative of f is just the nth derivative of In evaluated at a+y. Thus, f(@,1) =@, 
and all first-order partial derivatives of f are (« + y)~!, which is 1 at (0,1). The 


9.5. Tayler’s Fermula 253 


second-degree partial derivatives are all equal to —(@+y)~2, which is —1 at (x,y) = 
(0,1). Each third-degree partial derivative is 2(«+y)~3. Thus, the degree 2 Taylor 
formula for f is 


In(x + y) = (1,1) te 1) 


where 


for some c between 1 and « + y. Here the expression in parentheses is the result 
of applying the rank 3 tensor which is 1 in every entry three times to the vector 
(x,y — 1). The result is (x + y — 1). 


The Mean Value Theorem. The Mean Value Theorem for a real-valued func- 
tion on an open subset of R? is a special case of Taylor’s formula. In fact, if we 
apply Theorem 9.5.1 in the case n = @, it yields 


f(x) = f(a) + Ro, 
where 
Ro = df (c)(x a) 
for some ¢ on the line segment joining a to x. Thus, we have proved 
Theorem 9.5.3. If f is a differentiable real-valued function on B,(a) C RP, then 
for x € B,(a) we have 
(x) — f(a) = df(c)(« — a) 


for some point c on the line segment joining a to x. 


‘As is the case for functions of one variable, the several variable Mean Value 
Theorem has a host of applications. We point out two of these in the following 
corollaries, the proofs of which are left to the exercises. 


Definition 9.5.4. A subset A C R? is said to be conves if, for each pair of points 
x,y € A, the line segment [2r, y] is also contained in A. 


Figure 9.5.1 illustrates examples of a convex set and a set which is not convex. 


Corollary 9.5.5. Suppose U is an open convex set and f is a differentiable real- 
valued function on U. If there is a number M > @ such that |\df(x)|| < M for all 
«€U, then 

\F@) — FMS Mlle gl 
for allz,y EU. 


Corollary 9.5.6. Let U be a connected open subset of RP and let f be a differen- 
liable function on U. If df(x) =@ for all € U, then f is a constant. 


254 9. Differentiatien in Several Variables 


Convex Net Convex 


Figure 9.5.1. Convex and Non-convex Sets. 


Max and Min. We know that if f is a real-valued function of one variable, 
defined on an interval J, w! 


ich has a local maximum or minimum at an interior 
point a of J, then either f’(a) fails to exist or f"(a) = 0. We now discuss the several 
variable analogue of this result. 


A function defined on a subset D C R? is said to have a local maximum at 
a € Dif there is a ball B,(a), centered at a, such that 


f(x) < f(a) forall «€ DNB,(a). 


If a is an interior point of D, then r may be chosen so that D,(a) C D and then 
this inequality holds for all x € B,(a). The concept of local minimum is defined in 
the same way, but with the inequality reversed. 


Theorem 9.5.7. If f is a function defined on D CR” and if f has a local maximum 
or a local minimum at an interior point a € D at which f is differentiable, then 


df(a) =0. 


Proof. Given any unit vector u, the function g(t) = f(a + tu) is defined for all 
real numbers ¢ in an open interval containing 0 and it has a local maximum (or 
minimum) at ¢ = 0. By the Chain Rule, g is differentiable at 0 and its derivative 
at 0 is the directional derivative df(a)+u of f at a in the direction u. Since the 
derivative of g at 0 must be 0, we conclude that df(a) -u = 0 for all unit vectors u 
and, hence, df(a) = 0. a 


This theorem does not tell us that a function must have a local max or min at 
a point where df is @ However, for functions of one variable, the second derivative 
test does give conditions that ensure that a local max or a local min occurs at a. 


The second derivative test for functions of one variable says that if f is a real- 
valued function on an interval J, then f has a local maximum at a € J if f/(a) =@ 
and f(a) <0. It has a local minimum at a if f'(a) = @ and f(a) > 0. The 
analogue of this in several variables will be presented below, but it requires the 
concept of a positive definite matrix. 


Definition 9.5.8. A p x p matrix A is said to be positive definite if h« Ah > 0 for 
every non-zero vector h € RP. It is negative definite if h- Ah < 0 for every non-zero 
vector h € RP. 


Note that, in checking to see if a matrix is positive definite, we only need to 
check that w+ Au > 0 for every unit vector u in R?. This is because, if h is any 


a 


9.5. Tayler’s Fermula 25. 


non-zero vector, then u = h/||h|| is a unit vector and h- Ah = |[h||2u- Au, which is 
positive if and only if uw- Aw is positive. 

It tums out that if a matrix is positive definite, then all nearby matrices are 
also positive definite. We will prove this using the concept of operator norm for a 
matrix (Definition 8.4.9). Recall that ||Az|| < ||Aljjlz|| if x is a vector in RP, A is 
a p X p matrix, and ||A|| is the operator norm of A. 


Lemma 9.5.9. If A is a positive definite p x p matrix, then there is is a positive 
number m such that if B is any px p matrix with ||B—A|| < m/2, then u-Bu > m/2 
for all unit vectors u € RP and, hence, B is also positive definite. 


Proof. The set of all unit vectors u isa closed hounded subset of RP. It is, therefore, 
compact. The function g(u) = w+ Au is a continuous real-valued function on this 
set and, hence, by Corollary 8.2.5, it takes on a minimum value m. Since u- Au > 0 
for all such u, we conclude that m > 0. Now it follows from the Cauchy-Schwarz 
inequality that 


u-(A=B)uS |lul) ||(A = Bul] S |je|/?) = Bl] =| - Bl 


This implies 


(9.5.7) u- Bu =u: Au—u-(A-B)u>m- ||A- Bl 
for all unit vectors u. Hence, if ||A — B|| < m/2, then wu» Bu > m/2 for all unit 
vectors u, which implies that B is positive definite. a 


Theorem 9.5.10. Let f be a real-valued function defined on a neighborhood of 
a €R?. Suppose the second-order partial derivatives of f exist in this neighborhood 
and are continuous at a. If df(a) =0 and d? f(a) is positive definite, then f has a 
local minimum at a. If df(a) = 0 and d? f(a) is negative definite, then f has a local 
maximum at a. 


Proof. We use Taylor’s formula with n = 1. Since df(a) = 0, it tells us that there 
is an r > O such that, for each h € B,(0), 


(9.5.8) f(a+h) = f(a) +h-df(c)h, 
for some c on the line segment joining a to a +h. 
Assume d? f(a) is positive definite. By the previous lemma, there is an m > 0 
such that if 
(9.5.9) \la? f(a) — 2 F(e)|| < m/2, 
then d?f(c) is also positive definite. 


Since the second-order partial derivatives of f are continuous at a and since 
|le—al| < |||], it follows from Theorem 8.4.11 that we can ensure (9.5.9) holds by 
choosing ||h|| sufficiently small. Hence, there is an 6 > 0, with 6 < r, such that 
||h|| <6 implies that d2f(c) is positive definite for all ¢ on the line segment joining 
ato h. By (9.5.8), this implies that f(a+h) > f(a). Thus, f has a local minimum 
at a in this case. 

The case where d?/(a) is negative definite follows from the above by simply 
applying the above result to —f. a 


256 9. Differentiatien in Several Variables 


Max/Min for Functions of Two Variables. Let f be a function of two vari- 
ables with second-order partial derivatives which are defined in a neighborhood of 
(xo, yo) € R? and continuous at this point. The matrix d?f has the form 


of af 
Ox? Bxdy 
af Ff ]o 


dydx Ay? 
Since the second-order partial derivatives are continuous at (zo, yo), the cross par- 
tials are equal and so this matrix is symmetric (meaning it is its own transpose) 
at (2, yo). There is a simple criterion for a symmetric 2 x 2 matrix to be positive 
definite. This is described in the next theorem, the proof of which is left to the 
exercises. 


Theorem 9.5.11. Let A = ; ‘) be a symmetric 2X2 matrix and let A = ac—b? 


be its determinant. Then 


(a) A is positive definite if and only if A> @ and a> 0; 
(b) A is negative definite if and only if A>@ anda <@; 
(c) if A <@, then there are vectors u,v € R2 with u- Au>@ and v- Av <@ 


For a function f on R*, a point where the expression u - d?f(a)u is positive 
for some unit vectors u and negative for others is called a saddle point. At such a 
point, there will exist lines through a along which f has a local maximum at a and 
other lines through a along which f has a local minimum at a. 


The previous theorem has the following corollary, the proof of which is also left 
to the exercises. 


Corollary 9.5.12. Let f be a function of two variables with second-order partial 
derivatives which are defined in a neighborhood of (xo, yo) € R? and continuous at 


arf a2 a 
thislpon Tae see Ok ( f 


) evaluated at (zo, yo). Then 


ax? Dy? \ axdy 
arp 
(a) f has a local minimum at (xo, yo) if A >0 and Pa > 0 at (xo, yo); 
a 
ors 


(b) f has a local maximum at (xo, yo) if A > @ and <4 < @ at (x0, 40); 


de 
(c) fA <@, then f has a saddle point at xo, 4. 


Example 9.5.13. Find all points where f(a, y) = 2? + cy + y? — 2e —4y +1 has 
a local maximum and all points where it has a local minimum. 

Solution: We have df(«,y) = (2e +y—2,x+2y—4). Thus, the only point at 
which df(x,y) = @ is the point a = (@,2). This is the only possible point at which 
a local max or min can occur. The second differential d2f(2,y) is the constant 


matrix 
21 
# plea =(j A 


9.5. Tayler’s Fermula 257 


Max Min Saddle 


Figure 9.5.2. Surfaces with Max, Min, and Saddle Points. 


This has determinant A = 3. By the previous corollary, we conclude that (@, 2) is 
a point at which a local minimum occurs and there is no local maximum. 


Example 9.5.14. Find all points where f(«,y) = «? + 3cy + y? —a— dy +5 has 
a local maximum, minimum, or saddle. 

Solution: We have df (x,y) = (2¢ + 3y— 1,32 +2y—4). Thus, the only point 
at which df(2,y) = @ is the point a = (2,-1). This is the only possible point 
at which a local max or min can occur. The second differential d?f(«,y) is the 


constant matrix 
2 3 
#pen= (3 9)- 


This has determinant A = —5 . Thus, (2,—1) is a saddle point for f. 


Lagrange Multipliers. Suppose U is an open subset of R? and f : U + R 
and G : U + R? are differentiable functions. The subject of Lagrange multipliers 
concerns the problem of finding points of local maximum or local minimum of f 
subject to the constraint that G(x) = @. That is, we wish to find the points of 
local maximum and local minimum of f considered as a function on the level set 
G(a) = @ for G. The following theorem applies to this problem. Its proof uses a 
corollary of the Implicit Function Theorem which will be proved at the end of this 
chapter. 


Theorem 9.5.15. With U, F, and G as above, suppose that dG has rank d on 
U and S is the level set S = {x € U : G(x) = O}. If b is a point of relative max 
or min for f on S, then there is a linear transformation A : R¢ — R such that 


df(b) = AdG(). 


Proof. In Corollary 9.7.3 we will prove that, under the above conditions, S is a 
smoothly parameterized p-surface in a neighborhood of each point of S. We will as- 
sume this result here and we may as well assume that U is the neighborhood. Then 
there is an open subset V of R? such that S is the image of a one-to-one differentiable 
function H : V + U with RankdH = p on V. Furthermore, dG(T(a)) odH(a) = @ 
for each a € V. Thus, if a € V and b = H(a), then the kernel of dG(b) contains the 
image of dH (a). However, the kernel of dG(b) has dimension q— d = p as does the 
image of dH (a). It follows that the two subspaces of R® are equal. 


258 9. Differentiatien in Several Variables 


Since f has a local max or min on S at b, fo H has a local max or min on 
V at a. This implies df(b)dH (a) = d(fo H)(a) =0. Since dG(b) has rank d, i 
image is all of R¢. Thus, for each y € R4, there is an x € RY such that dG(b)x = y. 
We then set A(y) = df(z). If ar; is another vector in RY with dG(b)r1 = y, then 
ax — x, € kerdG(b) = Im(dH(b)) and so df(b)(* — 21) = @ This means df(b)e is 
the same vector no matter which vector « is chosen with dG(b)x = y. Thus, A(y) 
is well defined by the condition 


(9.5.10) Ay) =df(x) whenever dG(b)x 


y. 
For vectors 1,42 € R@ we may choose #1,29 such that dQ(b)2; = 4. ‘Then, 
dG(e1 +22) = dC(x1) + dG(a2) = 41 + ye and so 

A(yi + yo) = af (a1 + x2) = df (x1) + df (x2) = A(yr) + A(ye)- 


A similar argument shows that A(kx) = kA(x) if k is a scalar. Thus, A is a linear 
transformation. By (9.5.10), A satisfies df(b) = AdG(b). a 


The above result looks less mysterious if we write it out in terms of the coor- 
dinate functions of G. If G = (g1,---,ga); then S is the surface of vectors « € R@ 
which satisfy the constraints 


(9.5.11) g(t) =0,...,ga(x) =0. 
The theorem says that if b is a point of S on which f has a local max or min on S, 
then there is a vector A = (Aj,..., Aa) such that 
ve 
(9.5.12) 2 = vel (b) for k=1,...,q. 


Thus, to find candidates for points on S where a local max or min could occur, one 
should solve the equations (9.5.11) and (9.5.12) for «1,...,¢g,A1,...,Aa- Note that 
this system of equations has d+q equations and d+q unknowns. The components 
Diy.--)Ag of A are called Lagrange multipliers. 


Example 9.5.16. Find where the function f(2,y, z) = 2ey+2 attains its maximum 
w+yrt22=1}. 

Solution: Since the unit sphere in R® is compact and f is continuous, there 
are points on S where f attains its maximum and minimum values. We use the 
method ‘of Lagrange-multipliers: ai deseribed -in the previous’ theorenito- obtain 
candidates for these points. Here, d = 1 and q = 3 in (9.5.11) and (9.5.12). 


and minimum values on S = {(2,y, 2) € R? 


With g(x,y, z) =x? +y? + 2? — 1, we must solve the system of equations: 


9 (2, Y; 2) 


These are the equations 
we+y?4227=1, Ww=2D2Dy, 2=2dx, 1=2drz. 


The second and third equations yield « = ?x and y = A*y. These hold if and 
only if ¢ = y = @ or \=+1. But \ = +1 implies « = +y and, together with the 


9.5. Tayler’s Fermula 259 


fourth equation, implies z = +1/2. This and the first equation imply « = 
y = +,/3/8. Thus, the solutions of the above system of equations are 


(0,01), (\/378, V378,1/2), (-V3/8,-V378, 1/2), 
(-v328, v3/8.-1/2), and (v328,- V378.-1/2). 


The values of f at these five points are, respectively, 1,5/4,5/4,-5/4, —5/4. So, f 


has maximum 5/4 attained at ( /3[8, \/3/8, 1/2) and (- v3 '=4/3/8, 1 /2) and 
minimum —5/4 attained at (- /3B, /378, 1/2) and (v378 = /3/8,-1/2). 


Exercise Set 9.5 


1. Find the degree n = 2 Taylor formula for f(x,y) = 2? + xy at the point 
a= (1,2). 
2. Find the degree n = 2 Taylor formula for f(x,y) = e*Y at the point a = (0,0). 


3. Suppose a € RP and f is a real-valued function whose second-order partial 
derivatives all exist and are continuous on B,(a). Also, suppose that the op- 
erator norm ||d?f(a)|| of the matrix d? f(z) is bounded by M on B,(a). Prove 
that 

M 5 
f(x) — f(a) — df(a)(« — a)| < = ||a - al 
for all « € B,(a). 
. 


4. Prove Corollary 9.5.5. 


. Prove Corollary 9.5.6. 


aa 


. Show that the following form of the Mean Value Theorem is not true: if F : 
R? — R? is a differentiable function and a,b € R®, then there is a c on the line 
segment joining a to b such that F(b)— F(a) = dF(c)(b—a). The problem here 
is that F is vector-valued, not real-valued. 

7. Show that the following version of the Mean Value Theorem for vector-valued 

functions is true: if U is an open set in R? containing the line segment joining 

ato band if F: U > R1 isa differentiable function on U, then, for each vector 

u € RY, there is a point con the line segment joining a to b such that 


u-(F(b) — F(a)) = u-dF(c)(b—a). 
8. Find all points of relative maximum and relative minimum and all saddle points 
for 
f(2,y) =1- 2a? — 2ey - y?. 
9. Find all points of relative maximum and relative minimum and all saddle points 
for 
f(a,y) = y? + y? + 2? — Qay — By. 
1 
2. 


1@. Prove Theorem 9.5. 


11. Prove Corollary 9.5. 


260 9. Differentiatien in Several Variables 


12. Show that it is possible for a function to have a relative minimum or maximum 
or a saddle at a point where both df and df are @. 


13. Use the Lagrange multiplier method to find the maximal and minimal values 
of f(z,y,z) =a —2y + 3z on the sphere 27 +y* + 2? =1. 


9.6. The Inverse Function Theorem 


If f is a real-valued function of one variable which is C1 on an open interval con- 
taining a and if f’(a) 4 @, then f’(a) is either positive or negative. Because f’ is 
continuous, f’(«) will have the same sign as f/(a) for all « in some neighborhood 
of a. This implies that f is strictly monotone in a neighborhood of a and, hence, 
has an inverse function. This inverse function is differentiable at b = f(a) and 
(f-1)"(b) = 1/f"(a) (Theorem 4.2.9). In this section we will prove an analogous 
result for a vector-valued function F of several variables. 


The condition that. f/(a) 4 @ is replaced in several variables by the condition 
matrix). 
The conculsion that f is strictly monotone in some open interval containing a is 


that dF (a) is a non-singular matrix (a matrix for which there is an inver: 


replaced by the conclusion that F is a one-to-one function in some neighborhood 
of @ in RP. 

A function F : V + W is one-to-one on V if whenever 2,y € V and x # y, 
then F(x) # F(y). It is onto W if every wu € W is F(x) for some « € V. 


Definition 9.6.1. With F as above, we will say that F has a smooth local inverse 
near a if there are neighborhoods V of a and W of F(a) such that F’ is a one-to-one 
function from V onto W and the function F~!: W — V, defined by F~!(u) = a if 


F(x) =u, is smooth on W. 


In what follows (until the proof of the Inverse Function Theorem is complete), 
U will be an open subset of R? and F : U — RP will be a smooth (that is, @') 
a smooth local inverse near any point 
a € U at which its differential is non-singular. 


function on U. We will prove that F ha 


The proof involves three steps. Assuming dF(a) is non-singular: (1) we show 
that F is one-to-one in a neighborhood of a; (2) we show that F maps this neigh- 
horhood onto an open set; (3) we show that the resulting inverse function is smooth 
and we calculate its differential. 


One-to-one. The next theorem shows that our function F is necessarily one- 
to-one on some open ball centered at a point where dF is non-singular. In fact, it 
shows much more. 


Theorem 9.6.2. If a € U and dF(a) is non-singular, then there is an open ball 
B,(a), centered at a, and a positive number M such that: 


(a) the matrix dF(«) is non-singular for all « € B,(a); 
(b) |e —yl| $ M||F(x) — F(y)|| for all x,y € B,(a); 


(c) the function F is one-to-one on B,(a). 


9.6. The Inverse Functien Theerem 261 


Proof. Let B be an inverse matrix for dF(a). Then d(BF)(a) = BdF(a) = I, 
where J is the p x p identity matrix (Exercise 9.3.1). 

Let G(x) = BF(x). Note that dG(a) = I, which is positive definite (since 
u- Tu = |lu|/? = 1 for every unit vector u € RP). Hence, by Lemma 9.5.9, there is 
an m > @ such that ||dG(«) — dG(a)|| < m/2 implies that dG(x) is also positive 
definite and, in fact, 


m/2<u-d@(a)u 
for each unit vector in u € RP. 


The partial derivatives of the coordinate functions of F are all continuous and 
so the same thing is true of G. If follows from Theorem 8.4.11 that, given m > @, 
there is an r such that B,(a) C U and 


||aG (x) — dG(a)|| <m/2 whenever |fr — al] <r. 
Thus, 
(9.6.1) u-dG(x)u > m/2 


for all « € B,(a) and all unit vectors u € R?. In particular, dG(z) is positive 
definite and, hence, non-singular, for all « € B,(a). Since dF(x) = B~'dG(c), this 
matrix is also non-singular for all « € B,(a). This proves part (a). 
Given two distinct points z,y € B,(a), we set k = |ly —2|| A @ and u = 
(y—<«)/k. Then w is a unit vector and the function @, defined by, 
G(t) =u-G(a+tu), 


is a real-valued differentiable function on an open interval containing [0, k). 
By the Mean Value Theorem, there is an s € [0, k] at which 
kd"(s) = o(k) — (0). 

By the Chain Rule, kd!(s) = ku-dG(a + su)u and $(k) — 6(@) = u:(G(y) — G(x). 
Thus, 

ku- dG(c)u =u: (Gly) — G(2)), 
where ¢ = + su. Then, by (9.6.1), 

mk/2 < ku-dG(c)u = u- (G(y) — G(a)) 


ee) < [lulllGW) - G@)|| < [IBINFW) - FL, 


which, since k = ||y — 2||, implies 


pale 


m 


lly-2|-S (F(z) — FI 


This concludes the proof of part (b) if we set M = 2||B||/m. 
Part (c) - that F is one-to-one on B,(a) ~ follows immediately from part (b), 
which shows that, for z,y € B,(a), © =y whenever F(x) = F(y). in 


262 9. Differentiatien in Several Variables 


Open Mapping Theorem. An open map is a function F such that F(V) is 
open whenever V is open. 


Theorem 9.6.3. With F as above, if dF is non-singular at every point of an open 
subset V of U, then F: V + RP is an open map. 


Proof. Given a € V, set b = F(a). We will show that F(V) contains an open ball 
centered at b. If we can do this for every a € V, then F(V) is open. The same 
argument can be applied to every open subset of V and, hence, we may conclude 
that F is an open map. 

The fact that dF(a) is non-singular implies there is an open ball B,(a) C V for 
which the conclusions of the previous theorem hold. We will show that the image 
of this ball contains an open ball Bs(b). 

Let ry be a positive number less than r. Then part (b) of the previous theorem 
implies that there is a positive number M such that 


|e -yl| S$ M||F(x) — F(y)|| for all x,y € By, (a). 
Since b = F(a), this implies, in particular, that 


(9.6.3) [F(e) = 0||> FF whenever |r — al] =r. 


TL 


We = 
e set 6 on 


and let v be any element of Bg(b). If 


g(x) =||F(x)—v|| for «€ B,,(a), 


then our objective is to show that g(u) =0 for some w in this ball. 


We will first show that g takes on its minimum value at an interior point of 
B,,(a). It does take on a minimum value, since g is a continuous function on the 
compact set B,,(a) (Corollary 8.2.5). Thus, we need to show that it does not take 
on this minimum at a boundary point of By, . 


If z is a boundary point of B,,, then |b 
v € Bs(b) means ||b — v|| < are Thus, 


= a|| = 11 and (9.6.3) applies. Also, 


(2) = ||F(x) — v||> ||F(2) ~ 6|| —||6- oll = 5 = 8 


on the boundary of By,. 

Since g(a) = ||F(a) — v|| = ||b— v || < 4, the function g(z) does not achieve its 
minimum value on the boundary of By,(a). Hence, it must achieve its minimum 
value at a point u in the open ball B,,(a). Then g? (F(a) — v): (F(a) —v) has 
a local minimum at w and, hence, its differential vanishes at u, by Theorem 9.5.7. 
By Theorem 9.3.6, its differential is 2(F(«) —v)dF (xc). This expression vanishes at 
w if and only if F(u) —v is orthogonal to all the columns of dF(u). Since dF(u) is 
non-singular, by Theorem 9.6.2(a), this can happen only if F(w) — v = 0. Hence, 
we have shown that each v € Bg(b) is the image under F of some u € B,(a), as 
required. a 


9.6. The Inverse Functien Theerem 263 


The Inverse Function and Its Differential. With F as above, if F is one-to- 
one with a non-singular differential on an open subset V of U, then ¢(V) = W is 
also open, by the previous theorem. In this situation, F has an inverse function 
F-!:W — V defined by the condition that, for each y € W, F~!(y) is the unique 
a € V such that F(x) = y. 

Theorem 9.6.4. With F, V, and W as above, the inverse function F~! : WV 
is a smooth function on W with differential given by 

(9.6.4) dF~1(b) = (dF(a))~! = (dF(F7*(b)))7? 

for each b EW. Herea= F-1(b) EV. 


Proof. Given b € W and a = F~'(a), we choose r as in Theorem 9.6.2 and we 
choose it small enough that B,(a) C V. Then F(B,(a)) is also open, by the previous 
theorem. 

If y € F(B,(a)) and « = F~'(y), then x € B,(a). By the choice of r, the 
inequality in part (b) of Theorem 9.6.2 holds for x and a and says that 

IFNy) — F*(6)|| = |e — al S$ Mily — OIL. 

This implies that F~! is continuous at b. We calculate the differential of F~! at b 
as follows: 

The fact that F is differentiable at a means that if we set 
(9.6.5) (x) = F(x) — F(a) —dF(a)(x —a), 
then 
ees 

2a |jz—al| 
If we apply the matrix (dF(a))~! to both sides of (9.6.5) and use a = F~1(b), 
x = Fo '(y), the result is 
dF(a)'e(y) = (dF (a)) (y—b) — (F'(y) — F-*(0)), 
or 
F-1(y) — F-1(b) — dF(a)"'(y — b) = —dF(a)~‘e(2). 
If we set A’ = ||(¢F(a)) ||, then 
IF") = Fo*(b) = GF(@))"(y =O) S Kf le(@)|]_ - KMle@)IL 
lly = ol] ~ lye ~ le = al] 


Since F~? is continuous at b, « = F~!(y) approaches a = F~1(b) as y approaches 
b. Thus, the right side of the above inequality approaches 0 as y — b. By definition, 
this means that F~! is differentiable at b and 


dF~\(b) = (dF(a))7! = (dF(E7*(b)))". 


The partial derivatives of the coordinate functions of F~ are the entries of its 
differential matrix dF~1, which we just concluded is given by (9.6.4). Since F~! 
is continuous on W, the entries of dF() (the partial derivatives of the coordinate 
functions of F) are continuous on V, and the determinant of dF(:) is continuous 
and non-vanishing on V, we conclude that the partial derivatives of the coordinate 
functions of F~! are continuous on W. This means that F~! is C1, as claimed. 
This completes the proof. a 


264 9. Differentiatien in Several Variables 


The Inverse Function Theorem. The proof of the Inverse Function Theorem 
is now just a matter of combining the previous three theorems. 


Theorem 9.6.5. Let U be an open subset of R? and let F : U + RP be a smooth 
function. If a € U and det dF(a) #0, then F has a smooth local inverse function 
near a, with differential given by (9.6.4). 


Proof. By Theorem 9.6.2, F is one-to-one with a non-singular differential in an 
open ball B,(a). By Theorem 9.6.3, the image of B,(a) under F is an open set 
W. Then F has an inverse function F~! : W -+ B,(a) and, by Theorem 9.6.4, the 
inverse function is smooth with differential as claimed. a 


Example 9.6.6. Find all points a = (r,0) € R® such that the polar change of 
coordinates function 


F(r,0) = (r-cos8,r sin 8) 
has a smooth local inverse near a. Find the inverse and its differential near one 
such point. 

Solution: The differential of F is 
cos —rsind 
Fe) ee rcosd ). 


The determinant of this matrix is r, and so dF is non-singular everywhere except 
at r =0. By the previous theorem, this implies that F has a smooth local inverse 
near each a = (r,0) with r #0. 

If we choose the point @ = (1,0), then F(a) = (1,0). If V is the neighborhood 
of a defined by 


V={(r,0):r >0, —1/2< 0 < 2/2} 
and W is the neighborhood of b = F(a) defined by 
W ={(e,y):2 > 0}, 
then 


(9.6.6) F\Uz,y) = ( fom PP, tan 1(y/2)) 


defines the inverse function F~! : WV. 
The inverse matrix (dF (r,0))~! of the differential matrix dF(r, 0) of F is 


cos@ sin? 
-r-lsind r-'cosé)” 
By the previous theorem, this is the differential of the inverse function F~! at the 


point (x,y) = F(r,0). If we express r and @ in terms of x and y using (9.6.6), we 
obtain 


9.6. The Inverse Functien Theerem 265 


Note that the function F of the above example is definitely not one-to-one on 


all of R? or on {(r,) € R? : r #0} and so, as a function with either of these sets 
as domain, it does not have an inverse function. It is only when we restrict the 


domain of F to a set like the set V in the above example that it has an inverse 
function. What are some other sets V with the property that the restriction of F 
to the set V has an inverse function? This question is left to the exercises. 


—————————————SS 
Exercise Set 9.6 


. Use the Inverse Function Theorem to determine the points of R near which the 


sin function has a smooth local inverse function. What is the derivative of the 
inverse function when it exists? 


. Show that the function F : R? + R® defined by F(x,y) = («? +y?, xy) has a 


smooth local inverse near points (x,y) where « # +y. Find the inverse function 
F-! on the set {(«,y) : —« < y < «} and identify its domain. Calculate the 
differential of this inverse function (1) directly and (2) by using the Inverse 
Function Theorem. Verify that the two methods give the same answer. 


. Near which points of R3 does the spherical change of coordinates function 


F(p,9, 0) = (pcos @sin ¢, psin@ sin d, pcos ¢) 


have a smooth local inverse? What is the differential of the local inverse at those 
? To avoid tedious computation, expres 
(r,0,) rather than in terms of the image variables (x,y, 2) = F(r,0, 4). 


points where it exists’ this in terms of 


. Show that the system of equations 


ul —ututv?, 
y =cosu+sinv 
can be solved for (u,v) as a smooth function F of (x,y), in some neighborhood 


of (0,0), in such a way that (u,v) = (0,0) when («,y) = (0,1). What is the 
differential of the resulting function F at (0, 1)? 


. Find a smooth local inverse function near (1, 7/2) for the function F of Example 


9.6.6. 


. Find a smooth local inverse function near (1,27) for the function F of Example 


9.6.6. Note that this is different from the inverse function found in the example, 
even though the point b = F(a) is the same in both cases. 


. Show that if U is a convex open subset of R? and F : U + R? is a @! function 


on U with a differential dF which is positive definite at every point of U, then 
F is one-to-one. Hint: Examine the role played by the function ¢ in the proof 
of Theorem 9.6.2. 


. Show by example that the result of the previous problem is not true if U is 


only.assumaédl tor bevcounected:“tather-thatt convex’: Hint. Dry the function 
F(x,y) = (a? — y?, 2cy) on R? \ {0}. 


266 9. Differentiatien in Several Variables 


9. Show that if F = (fi, fo) : R3 3 R? is a @! function and a is a point of R® 
at which dF has rank 2, then there is a @! function fj ; R? 3 R such that 
® = (fi, fo, fs): B® 4 R® has a C! inverse function near a. 


10. Show that the condition that dF(a) be non-singular is necessary in the Inverse 
Function Theorem by showing that if a function F from a neighborhood of a 
in R? to RP is differentiable at a and has an inverse function at a which is 
differentiable at F(a), then dF(a) is non-singular. 


11. Let y: J + R° be a smooth parameterized curve, defined on an open interval I, 
and let to be a point of I with //(to) # 0. Prove that there are neighborhoods 
U CJ of to and V of 7(to) and a pair f,g of C! functions defined in V such that 
the image of U under 7 is the set of solutions in V of the system of equations 
f(a, y,2) =0,9(«,y, 2) =0. Hint: Show that there is a @! function F from a 
neighborhood of (to, 0,0) in R? to R? with F(t,0,0) = 7(t) and with dF(to, 0,0) 
non-singular. Then apply the Inverse Function Theorem to F. The functions 
f and g are then two of the coordinate functions of F~. 

12. If F: RP + RP is a C! function, what can you s 
where 


about F at a point of R? 
||| has a local 


|F|| has a local minimum? How about a point where 


maximum? 


9.7. The Implicit Function Theorem 


In this section we continue to develop consequences of the Inverse Function Theo- 
rem. The most notable of these is the Implicit Function Theorem. First we interpret 
the Inverse Function Theorem in the context of local systems of coordinates. 


Local Systems of Coordinates. Let F be a smooth function defined on an 
open subset U of R? which has values in R? and which has a smooth local inverse 
near a point a € U. Then there is a neighborhood V of a and a neighborhood W 
of b = F(a) such that F : V — W is one-to-one and onto and has a smooth inverse 
function G= Fo: WV. 


We define a change of coordinates for points in V as follows: if 


F= (fi, for--s fp) 


then we define new coordinates (uj,u2,...;tp) for a point x = (y,2,...,2») in 
V by setting 


wi = filer, @2,...,@~) for i=1,...,p. 
The coordinates u1,-.., up are smooth functions of the old coordinates x1,...,ap 
and, similarly, the old coordinates are smooth functions of the new coordinates 
since 
25 =9j(u1,u2,---)Uy) for j=1,...,p, 

where g; is the jth coordinate function of the inverse function G. 

By subtracting the constant b from F, if necessary, we may assume that F(a) = 
0 and W is a neighborhood of 0. This just makes the point a the origin in the new 
coordinate system. 


The Implicit Functien Theerem 267 


A coordinate hyperplane (intersected with W’) in the new coordinates is a set 

of the form 
Hy, = {ue Wu; =0}. 
In the original coordinates, this is the set 
{w@EV: fila) = 0}. 

This means that the level set {a € V : fi(w) = 0} for the function f; looks like a 
smoothly deformed hyperplane (intersected with V). Similarly, the subset obtained 
by setting k of the coordinates {u,. 
subspace of RP. In the old coordinates this looks like a smoothly deformed (p — k)- 
subspace intersected with V. If k = p—1, the result is a line through the origin in 
the new coordinates and a curve through a in the old coordinates. 


up} equal to zero is a (p — k)-dimensional 


Parameterizing a Curve. A key question raised in the last subsection of Section 
9.4 is: when does a level set for a smooth function from one Euclidean space to 
another locally have a smooth parameterization and, hence, a tangent space at each 
of its points? The following example gives an answer to this question in the case of 
a level set for a real-valued function on R?. The method used in this example is a 
model for the proof of the Implicit Function Theorem, which will be proved next. 


Example 9.7.1. Show that if f : R? + R! is a smooth function and (a,b) is a 
point of R? such that f(a,b) = 0 and df(a,b) 4 0, then there is a neighborhood V 
of (a,b) in which $ = {(x,y) : f(x,y) = 0} is the image of a smooth parameterized 
curve. Find the tangent line to this curve at (a,b). 


of 


Solution: Since df(a,b) 4 0, either — or a is non-zero at (a,b). Assume 
y 


Oa 
a. 
La, b) #0 (the analysis in the other case is the same, but with the roles of « and 
y 


y reversed). We define a function H ; R? + R2 by 


H (x,y) = (@, f(x,y). 
The differential matrix of this function is 
i -O 
of of |, 
dc Oy 
ilich: has determinant Sines ce" (aib)<ge Osthis amatroe is, non-aingeilan At 


oy 
(a,b). Hence, there is a neighborhood V of (a,b), a neighborhood W’ of (a,0), and 
a smooth inverse function H~!:W — V for H. We have 


H'(a,0) = (k(x), 9(x)), 
for some smooth real-valued functions k, g, defined for all a with («,0) € W’. Then, 
(©,0) = Ho H-'(x,0) = (k(x), f(k(),g(a))) whenever (a, 0) € W. 


It follows that k(x) = « and f(«,g(x)) = 0 for all such x. On the other hand, if 
(x,y) €V and f(x,y) =0, then H(«,y) = («,0) and so 


(c,y) =H! o H(a,y) =H” (2,0) = (a, 9(@)). 


268 9. Differentiatien in Several Variables 


Thus, y = g(x). We conclude that, for (x,y) € V, f(x,y) = 0 if and only if 
y = g(x). Since (a,b) € V and f(a,b) = 0, this means, in particular, that g(a) = b. 
Thus, we have proved that, near (a,b), S is the graph of the smooth function g and 


( 


is a smooth parameterization of S near (a,b). 


aie x, g(x) 


The tangent line to S at (a,b) is given parametrically by 


T(x) = (a,b) + y'(a,b)(x — a) 
= (a,b) + (1, 9/(a))(« — a) 


(x, b+ g'(a)(x — a), 


where, since f(,9(«)) =0, the Chain Rule tells us that 


; af\ | of 
Ge (ey Se, 
# ay) Ox 
The tangent line can also be described as the set of all (x,y) such that (a —a,y—b) 
is orthogonal to the gradient of f at (a,b) — that is, all solutions to the equation 
fe) 


a 
Fp b)(e = a) + Fela b)(y =b)=0. 


The Implicit Function Theorem. The proof of the Implicit Function Theorem 
follows exactly the same pattern as the solution to the preceding exercise. 


The Implicit Function Theorem provides the answer to a very simple question: 
when can an equation of the form 


F(2,y) = 


be solved for y as a function of x? That is, when can we find a function g such that 
F(x, 9(«)) = 0? We note several things about this problem: 


(1) The problem makes perfectly good sense if F is a real-valued function of two 
real variables (as in the previous example), but it also makes sense if F is a 
vector-valued function of variables « and y which are also vectors. 

(2) As wi 
there are local solutions to this problem for (,y) near a point (a,b) where 


s the case with the Inverse Function Theorem, we might expect that 


F(a,b) =0, even though global solutions may not be possible. 


(3) Whether such a local solution is possible near a given point may depend on 
conditions on the differential matrix of F at the point. 


In the statement and the proof of the Implicit Function Theorem, we will need 
to deal with certain submatrices of the full differential matrix of a function P. 
In this regard, the following notation will be useful. If fy, fo,..., fx are smooth 
functions defined on an open set U in some Euclidean space R¢ (these may be 
some or all of the coordinate functions of a vector function F defined on U) and if 


The Implicit Functien Theerem 269 


Yiz-+-sYm are some of the coordinates describing points in R¢, then we set 
of ah. Ah 
Oy, Oyo ym 
Of, Ofe 


AUfivsrfe) _ | 2 yz Pym 


ate fe hh 
Oy, Aye Ym 
If F =(f,,-.-; fg) :U --+ RY is a function on a subset U of R? with the coordinates 
Ofig nes, 
in RP labeled xj,...,2p, then Heal is just another notation for dF’, How- 
Oxi pees; 


ever, we will want to use this notation in cases where only some of the coordinate 
functions and/or some of the variables of F are used. 


In the following theorem, R?*? will be identified with R? x R? and points in 

this space will be expressed in the form (x,y) = (001,---;2p;Y1s««+s¥q)- 
Theorem 9.7.2. Let U C R?*4 be open, let F = (fis...) f9) 1 U -+ B® be a smooth 
function, and let (a,b) be a point of U with F(a,b) =. Also, suppose the square 
matric. we : 

afi fa) 

Ayr, -- +5 Ya) 
is non-singular, Then there are neighborhoods V C U of (a,b) and A of a and a 
smooth function G: A--+ RY such that (x,G(x)) € V for alla € A, G(a) =b, and 

F(x,y)=0 for 2,y€V ifand only if y=G(c). 

Furthermore the differential of Gon A is given by 


Algus---19a) (Asu—-f) * O(fis-+++fa) 
O(a, .-; Bp) O(yiy---+ Ya) é ; 


(9.7.1) dG= 


Proof. We will prove this by applying the Inverse Function Theorem to another 
function H, constructed from F. We define H ; U --+ RP x R4 by 


H(2x,y) =(2,F(2,y)) for (x,y) €U. 
The function H is @! on U because F is C!. The differential of H is 


je Or “Or ee 8 

Ceres Fee eee | 

e e 1 e 

dH = oh Of oh ne oh 
Ox Oxy  Oyr Oyq 
ofa... Of¢ Afg |. fa 


ax, dry Oy Oy 


270 9. Differentiatien in Several Variables 


with an identity matrix in the upper left p x p block and a 0 matrix in the upper 
right p x q block. The bottom ¢ rows form the differential matrix dF for F. The 
determinant of dH is just the determinant of the lower right qq block ~ that is, the 


. a ta) 
deters ib of 
etermimant 0} Oui 7 Ya) 


Hence, dH also has a non-zero determinant at (a,b) and is, therefore, non-singular 
at this point. 


a 


. This determinant is non-zero at (a,b) by hypothesis. 


By the Inverse Function Theorem (Theorem 9.6.5) there are neighborhoods 
VC U of (a,b) and W of H(a,b) such that H has a smooth inverse function 
Hl: W-+V. We have 

H-"(x,0) = (K(2),G(2)), 

for some smooth functions A’ and G, defined on A = {« € RP? : (x,0) € W} with 
values in R?. The set A is open because it is the inverse image of W under the 
continuous function « —+ («,0) : R? -+ R? x R4. Furthermore, 

(v,0) = Ho H-\(«,0) = (K(x), F(K(x),G(x))) whenever x € A. 
Thus, K(x) = 2 and F(x,G(x)) = 0 for all « € A. On the other hand, if (x,y) € V 
and F(«,y) = 0, then H(2,y) = (w,0) and so 

(c,y) =H! 0 H(a,y) = H” (2,0) = (a,G(z)). 

Thus, y = G(x). We conclude that if (x,y) € V, then F(x,y) = 0 if and only if 
y = G(z). This applies, in particular, when (a, y) = (a,b) and so G(a) = b. 

If we take the differential of both sides of the equation F(«,G(x)) = 0, the 
result is 


AFis eos Fa) fiver fa) Ag+) — 9 
Beery. nyitp) + Olyiy---4 Yq) O(ee1,--- 2p) 
Au. 
On solving this for 294+ 90) we obtain (9.7.1). a 
O(Bigedy Bp) 


The Implicit Function Theorem leads to conditions under which a level set of 
a function has a smooth parameterization and, hence, a tangent space. This is the 
issue raised at the end of Section 9.4. This is also a key issue in the hypotheses of 
the theorem concerning the method of Largrange multipliers (Theorem 9.5.15). 


Corollary 9.7.3. Let U C R¢ be an open set and let F : U -+ R@ be a smooth 
function. Suppose c € U, F(c) = 0, and dF(c) has rank q. Then there is a 
neighborhood V of c, V CU, such that the level set S = {u € V : F(u) = 0} isa 
smooth p-surface, where p = d—4q. That is, S has a smooth parameterization of 
dimension p. Hence, S has a tangent space at each point of S. Furthermore, the 
tangent space atc is the set of solutions u to the equation 


dF(c)(u—c) =0. 


Proof. Since dF(c) has rank q, there is a ¢ x ¢ submatrix of the ¢ x d matrix 
dF(c) which is non-singular. By rearranging the variables in F, if necessary, we 
may assume that the last q columns of dF form a non-singular matrix. With 
p = d—q, we may represent R@ as R? x R? and label the variables by («,y) = 


9.7. The Implicit Functien Theerem 271 


(1,-++52p,Y1y-++sYq)s a8 in the preceding theorem. Then the hypotheses of that 
theorem are satisfied, with c = (a,b). 


By the Implicit Function Theorem, there are neighborhoods V of ¢ = (a,b) and 
A of a and a smooth function G: A + R# with («,G(x)) € V for all x € A and 
such that F(«,y) = 0 for (x,y) € V if and only if y = G(z). 

Thus, 5 = {u = (x,y) € V: F(u) = 0} is the graph of the smooth function G. 
Then the function H(«) = («,G(«)) is a smooth parameterization of S. a 


Example 9.7.4. For the system of equations 
w4rr-2=0, 
utv+y=0, 
find the points on the solution set S at which it may not be possible to solve for u 
and v as smooth functions of x and y in some neighborhood of the point. 


Solution: According to the Implicit Function Theorem, there will be smooth 
solutions in a neighborhood of any point where the following matrix is non-singular: 


O(Fis fa) Gs =) 


u,v) \l 1 


where f(x,y,u,v) = u2 +0? —2 and fo(a,y,u,v) = u+tv+y. This matrix is 
singular only when u =v. This happens at a point on $ if and only if w= v and 
re Sey 

y? = 2x. 


Recall that the kernel of an affine transformation L ; R? + R of rank 1 is 
a hyperplane in R?. The Implicit Function Theorem allows us to draw a similar 
conclusion for functions which are not affine. 


Example 9.7.5. For the equation 

ety +23 =0, 
at. which points on its solution set S can we be assured that there is a neighborhood 
of the point in which S is a smoothly parameterized surface? Find an equation of 
the tangent space at each such point. 

Solution: By the corollary to the Implicit Function Theorem, there will be a 
smooth parameterization of S in a neighborhood of any point at which df has rank 
1, where f(2,y,z) =a? +y? + 2°. Since 

df (x,y, 2) = (2e, 2y, 327), 


the only point at which such a parameterization may not be possible is the origin. 


At any point (a,b,c) which is not the origin, an equation for the tangent space 


df (a,b, c)(« —a,y —b, z — 0) =0, 


or 


2a(x — a) + 2(y — b) +: 3c*(z — 0) 


272 


9. Differentiatien in Several Variables 


a 


rr 
Exercise Set 9.7 


. Are there any points on the graph of the equation x? + 3ry? + 2y? = 1 where 


it may not be possible to solve for y as a smooth function of x in some neigh- 
borhood of the point? 


. Can the equation xz + yz + sin(a +y +z) = @ be solved, in a neighborhood of 


(0,0,@) for z as a smooth function z = g(2, y) of (2,y), with 9(0,€) = 0? 
Find Afi, fe) if 
(u,v) 


fila,y, u,v) =u? +0? +2? +y?, 
fala, y,u,v) =2utyv+a—y. 


At which points («,y, u,v) is this matrix non-singular? 


. Show that the system of equations 


wort Qu-cy+z=0, 


w+sinv—cut+yot22=0 


has a solution for (u, v) as a smooth function of (x,y, z), in some neighborhood 
of (00,0), with the property that (u,v) = (@,@) when (2, y, 2) = (0 


. Show that the system of equations 


uw + 220? = Qy +h =8, 
vty —Ie+w=6, 
wt+we-y =e 


has a solution for u,v, w as functions of (,y) in a neighborhood of the point 
(x,y, u,v, w) = (1,1, 1, 1,0) with u(1, 1) = 1, (1,1) = 1,w(1, 1) =@. 


. For the equation cy + yz + «z = 1, at which points on the solution set S is 


there a neighborhood in which S is a smooth 2-surface? At each such point 
(a,b,c), find an equation of the tangent plane. 


. For the system of equations 


2 


ery 2 =6, 
atytz=@, 


at which points of the solution set S is there a neighborhood in which $ is a 
smooth curve? At each such point, find an equation of the tangent line. 


. For the system of equations 


ety? +w? —3v=1, 
Qe + ey —y + 3u? —9v= 


find all points on the solution set S for which there is a neighborhood in which 
S is a smooth 2-surface. 


The Implicit Functien Theerem 273 


9. If F(x, y, u,v) = (we" +ye", evt+yu) € R®, find those points (a, y, u,v) at which 
the level set of F, containing this point, is a smooth 2-surface in a neighborhood 
of the point. 

10. If F : RP + RY is a smooth function and dF has rank q at a certain point 
a € R?, prove that there is a neighborhood of a in which dF has rank q. 


—Sy 
Chapter 10 


Integration in 
Several Variables 


Integration theory for functions of several variables has much in common with 
integration for functions of a single variable. Man 


of the proofs are almost identical. 
However, there are some fundamental differen 


In one variable, we only have to worry about integrating over an interval. How- 
ever, in several variables the sets we integrate over can be much more complicated. 
There are issues concerning the boundary of the set and how large it can be. Such 
issues don’t arise in the theory of integration of a function of one variable. In one 
variable, the change of variable formula for integration (the substitution formula) 
is quite simple and has a simple proof ~ it follows directly from the Chain Rule 
for differentiation and the Fundamental Theorem of Calculus. The analogous for- 
mula in several variables is much more complicated — it involves the determinant 
of the differential of the change of variables transformation. Its proof is long and 
complicated. 


We begin with a definition of the integral of a function over a multidimensional 
rectangle. 


10.1. Integration over a Rectangle 


An aligned rectangle in R¢ is a set of the form 


R= (ay, bi] x +++ x aa, bal = {(21,---, 24) € RY ap Say < de, k= 


dy. 


We call such a rectangle aligned because each of its edges is parallel to a coordinate 
axis. Unless otherwise specified, in this chapter the term rectangle will mean aligned 
rectangle. Note that in one dimension an aligned rectangle is just a closed bounded 
interval, while in two dimensions an aligned rectangle is an ordinary rectangle with 
sides parallel to the coordinate axes. 


276 10. Integratien in Several Variables 


The d-volume of a rectangle is the product of the lengths of its edges ~ that is, 
the d-volume V(R) of the rectangle R above is 


a 


V(R) = T] (be — ax). 
k=1 
Thus, the I-volume of a rectangle (an interval) in R is its length; the 2-volume of a 
rectangle in R? is its area. The 3-volume of a rectangle in R® is its ordinary volume. 


Note that it is possible for one of the intervals [ax,bx] defining a rectangle in 
R¢ to be degenerate ~ that is, it could be that a, = by. In this case, the rectangle 
has d-volume 0. This makes sense, because it is actually a rectangle of dimension 
d—1in this case. 

As long as the dimension of the ambient space R@ is understood, we will drop 
the d and just refer to the d-volume of a rectangle as its volume. 

An aligned partition P of an aligned rectangle R = [a1,:] x +++ x [aa,bal is an 
assignment to each k = 1,...,d of a partition 


{ay = 20,6 S 1p S--- S Lae = be} 


of the intervals [ax, bx]. An aligned partition divides R up into subrectangles of the 
form 


[ea-ais taal X +++ X [eje-1,4s X5e,a] 
= {(x1,...,¢4) ER: ay -1k Ste Se, K=1,...,d}. 


Each of these will be called a subrectangle for the partition P of the rectangle R. 
If n is the number of subrectangles for P, then we will number these subrectangles 
in some fashion so that we have a list {Ry,Ro,-..,Rn} of all the subrectangles 
for P. We will not attempt to arrange this numbering scheme in a way that has 
anything to do with the indexing of the points in the corresponding partitions of 
the individual intervals [ax, bg]. To do so would lead to an awful mess. 


Note that R is the union of the subrectangles determined by a partition of R 
and any two of these subrectangles are either disjoint or have a lower-dimensional 
rectangle as intersection. The volume of R is the sum of the volumes of the sub- 
rectangles determined by the partition. 

Note also, for technical reasons, in this chapter we make a small change in 
the definition of a partition of an interval. We allow two successive points in a 
partition to be the same. That is, a partition of an interval J now has the form 
{a = 29 <2 S++» < ay =D} rather than {a = 29 < 21 < +++ < aq = db}. This is 
a minor change which in no way affects the definition or properties of the integral, 
but it allows degenerate rectangles to have partitions. 


Unless otherwise specified, in this chapter, the term partition will mean aligned 
partition. 


Upper and Lower Sums. Let f be a bounded real-valued function defined on 
a rectangle R and let P be a partition of R determining a list of subrectangles 
Ri, Ro,..., Rn. 


10.1. Integratien ever a Rectangle 277 


by = Bao 


1 bs} 
Poy. Pra Pan, Fog 


Figure 10.1.1. Partition of a Rectangle. 


Definition 10.1.1. If f, R, P, and {Ry, Ro,--- , Ry} are as above, then we define 
the upper and lower sums for f and P by 


U(f.P) = 2 M,V(R), 
(10.1.1) i 
LEP) = SmyV(R5), 
j=l 


where 


M; =supf and m, = inf f. 
Rj Ry 


This is exactly the way we defined the upper and lower sums for f and the 
partition P in Definition 5.1.1, except there we were partitioning intervals into 
subintervals and here we are partitioning d-dimensional rectangles into subrectan- 
gles. 


As in Section 5.1, a Riemann sum for f and P on R is a sum of the form 
(10.1.2) SS s(uy)V(R;) 
= 


where, for each j, uj is some point in the rectangle Rj. For each j, the term 
f(uj)V(Rj) represents the volume (or minus the volume, if f(a) < 0) of a (d+ 1)- 
dimensional rectangle with base Rj and with height [f(wj)|. Now, for each j we 
have 

my < f(uj) < Mj, 


which implies 


L(f,P) S So f(uj)V(Rj) < ULE, P)- 


j=l 


278 10. Integratien in Several Variables 


Thus, as in Section 5.1, every Riemann sum for f and P lies between the lower and 
upper sums for f and P. 


Refinement. If is a rectangle in R? and if P and Q are partitions of R, then 
Q is said to be a refinement of P if every subrectangle of R determined by Q is a 
subset of some subrectangle determined by P. 

If R = [a1,bi] x +++ x (ag, ba], then the partition P consists of a partition of each 
of the intervals [ax, bx], as does the partition Q. It is not difficult to see that Q is a 
refinement of P if and only if, for k = 1,...,d, the partition of [ax, bx] determined 
by @ is a refinement of the partition of this same interval determined by P. For 
this reason, it is also easy to see that any two partitions P, Q of R have a common 
refinement, since this is true for partitions of intervals. 


If Q is a refinement of P, then since R is the union of the subrectangles of 
itself determined by a given partition, each subrectangle for P is a union of the 
subrectangles for Q which it contains. This is the key fact needed to prove the 
following theorem in essentially the same way as the analogous theorem in one 
variable (Theorem 5.1.4). The details are left to the exercises. 


Theorem 10.1.2. Let f be a bounded function on a rectangle R in R4. If Q and 
P are partitions of R and if Q is a refinement of P, then 


(10.1.3) L(f,P) $ L(f,Q) $ U(F,Q) S US, P). 


Let P; and P) be any two partitions of R and let Q be a common refinement 
of P; and P>. Then (10.1.3) holds with the first P replaced by P; and with the 
second P replaced by P». The resulting inequalities imply the following. 


Theorem 10.1.3. If P; and P2 are partitions of R, then 


L(f, Pi) < U(f, Pa). 


Thus, any lower sum for f is less than or equal to any upper sum for f. 


Upper and Lower Integrals. 


Definition 10.1.4. Let R be a rectangle in R¢ and let f be a bounded real-valued 
function on R. The upper and lower integrals of f on R are defined by 
fi f(x) dV(x) =inf{U(f, P): P a partition of R}, 
(10.1.4) a 
i f(x) dV(x) = sup{L(f, P): Pa partition of R}. 
oR 


The set of all upper sums for f is bounded below by any lower sum and the set 
of lower sums is bounded above by any upper sum. Thus, the inf (greatest lower 
bound) of the set of upper sums is greater than or equal to any lower sum and, 
hence, also greater than or equal to the sup (least. upper bound) of the set of all 


10.1. Integratien ever a Rectangle 279 


lower sums. Thus, 
Theorem 10.1.5. If f is a bounded real-valued function on a rectangle Rand if 
P and Q are arbitrary partitions of R, then 
LEP)< f sayaviays | se)av(e) <U.@). 
JR R 
The Integral. A bounded function on R is integrable if its upper and lower 
integrals are the same. That is: 


Definition 10.1.6. Let R be a rectangle in R@ and let f be a bounded real- 
valued function on R. If f , f(e)dV(x) = [pf (w) aV (2), then we will say that f is 
integrable on R. In this case, we will call the common value of these two expressions 
the Riemann integral of f on R and denote it by 


[reac 


The proofs of the following two theorems are exactly the same as the proofs of 
Theorems 5.1.7 and 5.1.8 and we will not repeat them here. 


Theorem 10.1.7. If f is a bounded function on a rectangle R, then f is Riemann 
integrable on R if and only if, for each € > 0, there is a partition P of R such that 


(10.1.5) U(f,P)- Lf, P) <e. 


Theorem 10.1.8. With f and R as above, f is Riemann integrable on R if and 
only if there is a sequence {P,,} of partitions of R such that 


(10.1.6) lim(U(f, Pn) - L(f, Pn)) = 0. 


In this case, 
ri f(a) dV (a) = lim Sy (f) 
R 


where, for each n, Sn(f) may be chosen to be U(f, Pa), L(f; Pn), or any Riemann 
sum (10.1.2) for f and the partition Py. 


Remark 10.1.9. The preceding two theorems both involve the difference between 
the upper and lower Riemann sums for f and P. This can be written as 


n 


(10.1.7) U(f,P) — L(f,P) = S(Mj — m3)V(Rj). 


j=l 


The factors M; —m, that appear in this expression are non-negative numbers, as 
are the numbers V;. Hence, any operation that reduces or eliminates some of the 
terms in this sum will result in a smaller sum. 


280 10. Integratien in Several Variables 


Properties of the Integral. 


Theorem 10.1.10. Jf f and g are integrable functions on an aligned rectangle R 
in R¢ and if c is a constant, then 


a) cf is integrable and ef(x)dV(x2) =c E) am) 
(a) of is integ fe@ave) ee 
(b) f+ ie integrable and ri (f+ gle a= [1@) f(«) dV (x) + is g(x) AV (2). 


The proof differs in no essential way from the proof of Theorem 5.2.3, so we 
will not repeat it. 

Taken together, the statements of the above theorem mean that the integrable 
ar multiplica- 
tion of functions, and the integral is a linear transformation from this vector space 
to the vector space R. 


functions on R form a vector space under pointwise addition and 


The order-preserving property is another key property of the integral. The 
version stated in the next theorem is somewhat more general than the analogous 
result, proved earlier for functions of a single variable (Theorem 5.2.4), and it has 
a different proof. Hence, we include the proof. 


Theorem 10.1.11. If f and g are functions on an aligned rectangle R in R¢ and 
if f(x) < g(x) for all x € [a,b], then 


(a) T sayavie) s f ofeyav(n and f f(x) dV(a) < g(a) dV (a); 
: dg : 


R 


(b) | f(x) dV (x) < i g(a) dV (x) if f and g are integrable. 
R R 


Proof. We will prove this result for the upper integrals. The result for the lower 
integrals has an analogous proof. The result for the integral in the case of integrable 
functions then follows because upper integral, lower integral, and integral are all 
the same for an integrable function. 

Given a partition P of R, determining subrectangles {Rj,..., Rn} of R, we set 


M;(f) =sup f and M,(g) =supg. 
R; R; 


Then M;(f) < Mj(g) for all j because f(x) < g(x) for all « € R. Hence, 


n n 


UEP) = So My(F)V(Ry) < YO Mj(g)V(Rj) = Ug, P).- 


j=l j=l 


It follows that. 


f ,fe)av(e) =n UL, P) < inf O(0,P) = f g(x) dV (2). 


R 


This completes the proof. a 


10.1. Integratien ever a Rectangle 281 


“ay uy 5) ty vy by 


Figure 10.1.2. Computing the Integral of ys. 


A Simple Example. So far we have not computed a single integral or shown 
that a single function is integrable. We do so now. The function we will integrate 
is very simple, though not continuous, but the computation of its integral is an 
important step in our development of integration theory. 


Definition 10.1.12. Let E be a subset of R¢. Then the characteristic function of 
E, denoted x p, is the real-valued function on R@ defined by 


weft f ree, 
ES Oi ae. 


Our example is as follows: 


Example 10.1.13. Let R and § be aligned rectangles with $C R. Show that ys 
is an integrable function on R and 


| xs(a)dV (a) = (5). 
R 


Solution: Let 
R= [ay,b)] x +++ x [ag,ba] and = S = [s1,ti] x +++ x [sa, tal, 
where aj < sj < tj < bj for each j. Given € > 0, we choose a partition of R as 
follows: for each j, we partition each interval [aj,bj] with the points {aj < uj < 
8; < ty < vy < bj}, where the points uj and v; are chosen so that if A is the 
rectangle 
A= [ur, vi] x +++ X [uas va); 

then V(A) < V(S) + (see Figure 10.1.2 for a two-dimensional version of this 
set-up). 

The sup of yg on a given subrectangle R; is 1 if R;NS ¢ 0 and it is 0 otherwise. 
The inf of ys on R; is 1 if Rj C S and it is 0 otherwise. 


282 10. Integratien in Several Variables 


There is only one subrectangle for this partition which is contained in $ and 

that is S itself. Thus, 

L(xs,P) = V(S). 
The union of the subrectangles R; that meet S is A. Hence, 

U(xs,P) =V(A). 
Since V($) < V(A) < V(S) + €, we have V(A) — V(S) <. Hence, 

U(xs,P) — L(xs,P) <e. 

By Theorem 10.1.7, xg is integrable on R. Its integral is within ¢ of L(xs,P) = 
V(S) for every € > @ and so fy, vs(x)dV (x) = V(S). 


Eh 
Exercise Set 10.1 


1. Let R = [0,1] (0, 1] be the square with vertices at (0,@), (1,@), (1, 1), and (@, 1) 
and let P be the partition of R consisting of the partition {@, 1/4, 1/2, 3/4, 1} 
in both factors of [0,1] x [@, 1]. Find U(f, P) and L(f, P) if f(x,y) = ay. 

2. With R and P as in the previous problem, find U(ya, P) and L(x, P) if A is 
the closed, solid triangle with vertices at (@,@), (1,0), (1,1). 

3. Suppose f and g are functions defined on an aligned rectangle R. Suppose 
there is a positive constant Av such that |f(x) — f(y)| < A’lg(w) — 9(y)| for all 
2,y € R. Prove that if g is integrable on R, then so is f. 


4. Use the result of the preceding exercise to prove that if f is an integrable 
function on an aligned rectangle R, then |f| is also integrable on R. 


. Prove that if f is integrable on R, then f? is also integrable on R. 


a 


6. Use the result of the preceding exercise to prove that if f and g are integrable 
on R, then fg is also integrable on R. 


7. Show that each constant function k is integrable and [j,k dV (a) = kV(R). 

8. If f is an integrable function defined on the rectangle R and if |f(x)| <M on 
R, where M is a positive constant, then prove that | fj, /(x) dV(«)| < MV(R). 

9. Prove that if R is an aligned rectangle and f is a continuous function on R, 
then f is integrable on R. 

10. If A and B are subsets of R¢, then 

ibe Yang in terms of y.4 and yp; 

) ibe X.4ug in terms of x4 and yg; 

(c) describe the meaning of B C A in terms of x4 and ye; 

(d) if BCA, describe y.4\g in terms of y.4 and Xz. 


10.2. Jordan Regions 


The concept of characteristic function of a set (Definition 10.1.12) allows us to 
define the volume of a set in terms of the integral that we just defined. It depends 
very much on the dimension of the ambient space R@ and so, technically, it should 


10.2. Jerdan Regiens 283 


be called the d-volume of the set. However, as with rectangles, we will drop the d 
when the dimension of the ambient space is understood. 


Definition 10.2.1. If E is a bounded subset of R4, let R be an aligned rectangle 
containing B. Then we define the outer volume V(E), the inner volume V(E), and 
the volume V(E) (if it exists) for B by 


(a) V(B) = J gxe(e) dV (0): V(E) = [,xn(2) dV (0); ana 
(b) V(E) = fpxe(x) dV (2) if x@ is integrable. 


If V(E) exists, then we call E a Jordan region. 


Note that E is a Jordan region if and only if V(E) = V(E) and, in this case, 
V(E) is their common value. 

Note also that, if E is an aligned rectangle, then E is a Jordan region and the 
above definition of V(E) agrees with our earlier definition. This is demonstrated 
in Example 10.1.13. 


Implicit in the above definition is the fact that the upper and lower integrals 
of ve over R do not depend on the rectangle R, as long as R contains B. We leave 
a proof of this to the exercises (Exercise 10.2.1). 


Example 10.2.2. Show that the closed, solid right triangle A in R? with vertices 
at (0,0), (a,0), and (0,6) is a Jordan region and has area (2-volume) ab/2. 
Solution: We choose R to be the rectangle [0,a] x [0,6]. This contains the 
triangle A. For each n, we choose a partition P, of R consisting of partitions 
{0,a/n,2a/n,...,na/n = a} of [0,a] and {0,b/n,2b/n,...,nb/n = b} of (0,8). 


This determines n? subrectangles of R, each of volume ab/n?. 


Now for each of these subrectangles Rj, the sup, Mj, and inf, mj, of Xa on 
R, is either 1 or 0. In fact, Mj = 1 if and only if R; A A 4 @ and mj = 1 if and 
only if Ry CA. 

Thus, the only subrectangles Rj on which Mj; 4 mj are those which are not 
contained in A but have non-empty intersection with it (the light grey subrectangles 
in Figure 10.2.1). There are two kinds of these, those of the form [(k—La/n, ka/n] x 
[(k — 1)b/n, kb/n] which are bisected by the line from (0,0) to (a,b) and those of 
the form [(k — l)a/n, ka/n] x [kb/n, (k + 1)b/n] which just have a lower right vertex 
on this line. There are n of the former and n —1 of the latter. The difference 
U(xa, Pr) — L(x, Pn) is just the sum of the areas of these 2n—1 rectangles, which 
is (2n — 1)ab/n?. Hence, 


2n — 1)ab 
lim (U (ca, Pa) = E(ca,Pa)) = lim, OPN" = 
= sae 
By Theorem 10.1.8, the Riemann integral [,, x4 (a) dV(«) exists and so the 2- 
volume (area) of the set A exists — that is, A is a Jordan region. 

Also by Theorem 10.1.8 the integral fy, x4 (22) dV(z) is the limit of the sequence 
{L(xa,Pn)}. However, L(xa, Py) is the sum of the areas of the subrectanges 
that are contained in A (the dark grey subrectangles in Figure 10.2.1). There are 
n(n — 1)/2 of these (half the number remaining after the ones that are bisected by 


284 10. Integratien in Several Variables 


(0, (a, b) 


Figure 10.2.1. Computing the Area of a Triangle. 


the line from (0,0) to (a,b) are removed). Hence, 
(n—1)ab ab 


v(A) = [ xaloyav(e) = jim => 


Properties of Volume. Many properties of the integral translate directly into 
properties of volume. For example, Theorem 10.1.11 implies that 


Theorem 10.2.3. If E and F are bounded subsets of R¢ and EC F, then 
V(E)<V(F) and V(E)<V(F). 

If E and F are Jordan regions, then V(E) < V(F). 

Theorem 10.1.10 and the fact that xsur = Xz +XR—-XeEnr (Exercise 10.1.10) 
imply 
Theorem 10.2.4. If E, F, and EOF are Jordan regions and V(EN F) =0, then 
EUF is a Jordan region and 

V(EU F)=V(E)+V(F). 

In particular, this identity holds if E and F are disjoint Jordan regions. 

In particular, if R is an aligned rectangle in R*? and R; # Rx are two of the 
subrectangles determined by a partition P, then Rj 9 Rx is either empty or it is a 


degenerate aligned rectangle in R — that is, its dimension is lower than that of R. 
Hence, V(Rj 0 Ry) = 0. Thus, by Theorem 10.1.10, 


V(RjU Re) = V(Rj) + V(Re)- 


An induction argument then shows that if F is the union of any number of the 
subrectangles determined by P, then F is a Jordan region and V(F) is the sum 


10.2. Jerdan Regiens 285 


of the volumes of these subrectangles. This is used in the proof of the following 
theorem. 


Theorem 10.2.5. If E is a bounded subset of R¢, then V(E) = V(E) and V(E) = 
V(E*). 


Proof. Let R be an aligned rectangle containing F, let P be a partition of R, 
and let {Rj} be the list of subrectangles of R determined by P. Then U(x, P) 
is the sum of the volumes of the rectangles R; in this list that have a non-empty 
intersection with E (those for which yg takes on the value 1 somewhere on Rj). If 
we set 


F=(J{R): ENR; 49}, 
then U(xg, P) = V(F), by the paragraph preceding this theorem. 


Now F is a finite union of closed sets and so it is also closed. Since E C F’, we 
also have E C F. Then 


V(E) S$ V(E) $ V(F) = V(F) = U(xe, P)- 

Since V(E) = inf{U(xx, P) : P a partition of R}, we have 

V(E) < V(E) < V(E). 
Thus, V(E) = V(E). 

Similarly, if we set 

G =(J{Rj: Ry cE}, 

then, since G° C E°, 
VC) < V(E*) < V(E). 
However, V(G°) = V(G) = L(xe,P), since the boundary of G consists of a finite 
union of rectangles of dimension lower than d, and these all have volume 0. Since 
supp L(xz,P) = V(E), we conclude that V(E°) = V(E). This completes the 
proof. Oo 
Theorem 10.2.6. [f E is a Jordan region, then so are E and E°. Furthermore, 
V(E) = V(E) =V(E°). 
Proof. In view of the previous theorem, 
V(E) S$ V(E) S$ V(E) < V(E). 

If Fisa Jordan region, then V(E) = V(E) and, hence, each of the above inequalities 


is an equality. This implies F is a Jordan region and V(E) = V(E). The proof of 
the statement for E° is similar. 


Sets of Volume Zero. We leave the proof of the following theorem to the 
exercises. 
Theorem 10.2.7. If E is a bounded set with V(E) =0, then E is a Jordan region 
with volume 0. Any subset of a Jordan region of volume 0 is also a Jordan region 
of volume 0. A finite union of Jordan regions of volume 0 is also a Jordan region 
of volume 0. 


We will, henceforth, refer to a set E with V(E) = 0 as simply a set of volume 0. 


286 10. Integratien in Several Variables 


Theorem 10.2.8. A set E is a set of volume 0 if and only if, for each € > 0, there 
is a finite set {Ry,..., Rn} of aligned rectangles such that 


Proof. If V(E) = 0, then there exist an aligned rectangle R with EF in its interior 
and a partition P of R such that U(xe,P) < ¢ This just means that those 
subrectangles determined by P which meet E have volumes which add up to a 
number less than €. Since E is contained in the union of these rectangles, the proof 
of the “only if” part of the theorem is complete. 


On the other hand, if FE C F = Uj. -, Rj for a set of aligned rectangles with 


volumes adding up to a number less than e, then V(F’) < € since 


xe S Doxr,- 
j=l 


This, together with the fact that each yp, is integrable, implies 


m= f x p(x) dV (x poe x) dV (« 
rf xR, (x) dV(« ->M (Rj) 


This proves the “if” part of the theorem. a 


A Characterization of Jordan Regions. 


Theorem 10.2.9. A bounded set E is a Jordan region if and only if its boundary, 
OE, is a set of volume 0. 


Proof. If P is a partition of R determining a list of subrectangles {Rj}, then 
L(xe,P) is the sum of the areas of those Rj which are entirely contained in 
E°, while U(xg,P) is the sum of the areas of those R; which have non-empty 
intersection with FE. It follows that 


U (xq, P) — L(xge, P) = U(Xae, P). 


Hence, a sequence {P,} of partitions has the property that lim U(xap, P,) = 0 if 
and only if it has the property that 

lim(U (xg; Pa) — LOxz°, Pn)) = 0. 
Since, for an appropriately chosen sequence of partitions, this limit is 


V(E) - V(E°) = V(E) -V(B), 


by Theorem 10.2.5, we conclude that V(E) = V(E) if and only if V(0E) = 0 
~ that is, F is a Jordan region if and only if OF is a set of volume 0. a 


10.2. Jerdan Regiens 287 


Theorem 10.2.10. [f A and B are Jordan regions, then AN B, AU B, and 
A\ (ANB) are also Jordan regions. Furthermore, 
V(AUB) =V(A) + V(B)—V(AN B), and 


O02) V(A\ (AN B)) = V(A) —V(AN B). 


Proof. Each of the sets AN B, AUB, and A\ (AN B) has its boundary contained 
in QAUOB. Since A and B are Jordan regions, 9A and OB are sets of volume 
0. Then Theorem 10.2.7 implies that 0A U OB has volume 0, as does each of its 
subsets. It follows from the previous theorem that AN B, AU B, and A\ (ANB) 
are Jordan regions. 

The second statement of the theorem follows from the identities 

XAUB = XA + XB — XaAnB and 
XA\(ANB) = XA — XAnB- Oo 

Example 10.2.11. Let A’ be a compact subset of R¢~! and let f: K 4 Rbea 
continuous function. Show that the graph G(f) of f is a set of d-volume 0, where 
G(f) = {(@, f(@)) 17 € KY}. 

Solution: Since A’ is compact, it is bounded, and so we may choose a rectangle 
Rin R@! which contains A’. Let W be the (d— 1)-volume of R. 

Since A’ is compact and f is continuous, f is actually uniformly continuous. 
Thus, given ¢ > 0, we may choose a d > 0 such that 

\f(z) — f(y) < «/W whenever ||x — y|| < 4. 
We let. P be a partition of R such that the diameter of each subrectangle for the 
partition is less than J (diameter in this case means maximal distance between two 
points in the subrectangle). Let Ri, R2,..., Rn be a list of those subrectangles for 
this partition which meet . If 
mj = min{f(x):2€ KOR} and M, = max{f(x):2€ KR}, 

then 


G(F) CJR; x [or .45)). 
J 
The sum of the volumes of the rectangles Rj x [mj, Mj] is 
€ fas 
SO V(R)(Mj = mj) < LV (R)) <pW=e 
Bi 


By Theorem 10.2.8 the graph G(f) of f is a set of volume 0. 


—SESEE eS ees 
Exercise Set 10.2 


1. Prove that Tpxm(w)dV (x) and fx n(x)dV(x) do not depend on the choice of 
the aligned rectangle R as long as it contains F. 

2. Prove Theorem 10.2.7 ~ that is, show that if a subset A of R@ has outer volume 
zero, then it and each of its subsets is a Jordan region of volume 0. 


3. Show that a finite set in R? has volume 0. 


288 10. Integratien in Several Variables 


4. If B is the subset of the unit square [0, 1] x [0, 1] consisting of points with both 
coordinates rational numbers, find its inner volume V(E) and outer volume 
V(E). Is E a Jordan region? 

. Show that if A and B are sets of volume @ in R%, then AU B is also a set, of 
volume @. 


a 


6. Let U be an open subset of R? and let K C U be a compact set. Suppose 
f:U + Risa smooth function and E = {(x,y) € K : f(x,y) = 0}. If df is 
never @ on E, then show that E is a set of area @ in R?. 


7. Show that an ellipse in R? is a set of area @ in R? and that the solid ellipse that, 
it bounds is a Jordan region. 


8. Show that a bounded subset of R? whose boundary is a finite union of smooth 
parameterized curves is a Jordan region. 

9. Consider the following three reflection transformations of R?: 

Ti(x,y) =(—2,y), Te(x,y) = ( y) = (ys). 

These are reflection through the y-axis, reflection through the «-axis, and re- 
flection through the line y = 2, respectively. Prove that if F is a Jordan region, 
then, for j = 1,2,3, so is T;(B) and V(T;(E)) = V(B). Hint: What do these 
reflections do to aligned rectangles and their volumes? 


—y), and T3 


. Using the previous two exercises and theorems from this section but without 
using Example 10.2.2, give a proof that the area of a triangle with one side 
parallel to a coordinate axis is one half its base times its height. Hint: Prove 
this first for right triangles with legs parallel to the axes. 

11. Using the result of the preceding exercise, show that a parallelogram in R? with 

one side parallel to a coordinate axis has area equal to its base times its height. 

12. Suppose B C R¢ is a compact Jordan region and f and g are continuous real- 

valued functions on B with g(x) < f(x). Show that the set 

A= {(a,t) ER! : 2 € Band g(2) <t< f(x)} 


is also a Jordan region. 


10.3. The Integral over a Jordan Region 


In this section we extend the definition of the integral to cover integration over a 
Jordan region. We also prove an existence theorem which shows that the class of 
integrable functions is quite large. 


An Existence Theorem. So far we have only proved the existence of the integral 
for a few functions of the form yz. Our next. objective is to prove a general existence 
theorem for the integral over an aligned rectangle. We will then extend this theorem 
to integrals over Jordan regions. 


Theorem 10.3.1. Let f be a bounded function on an aligned rectangle R. If the 
set of points of R at which f is not continuous is a set of volume @, then f is 
integrable on R. 


10.3. The Integral ever a Jerdan Regien 289 


Proof. Let E be the set of points of R at which f is not continuous. Since E is a 
set of volume @, its outer volume V(E) is @. Hence, given € > @, there is a partition 
P of R such that U(x, P) < €/(4M), where M is the sup of |f| on R. If A is the 
union of the subrectangles for P which meet FE, then this means that 
€ 
V(A) = U(xe,P) < —. 
(A) =U(xe,P) < 7 
Let B be the union of the subrectangles for P which do not meet EF. Note that 
AUB = Rand B is aclosed, bounded (hence compact) set on which f is continuous. 
Hence, f is uniformly continuous on B by Theorem 8.2.12. This implies that we 
may choose a 6 > @ such that, 


Ife) — Fy) < re whenever |x — yl] <6. 

We next choose a refinement @ for the partition P in such a way that the 
diameter of each subrectangle for Q is at most 6. If Ri, Ro,-.., Rn is a list of the 
subrectangles for Q, then each R; is either in A or in B. We let S be the set of 
integers j in [1,n] such that Rj C A and we let T be the set of integers j in this 
interval such that Rj C B. If M, and m; are the sup and inf of f on Ry, then 


U(f,Q)-L(f,Q) = Sow myvie A 


= Tes — mj)V(Ry) + S>(Mj — mj)V (Ry) 


jes jer 
€ 

<2MV(A)+ —*_v(B) <£4f Hc. 

SIMO S wm RY Sata s* 

In view of Theorem 10.1.7, the proof is complete. a 


The Integral over a Jordan Region. 


Definition 10.3.2. Let A be a Jordan region and let f be a bounded function 
defined on a set containing A. We define a new function f4, with domain all of R4, 


as follows: 
f(x) ifxeA, 
fale) = : a 
@) ifzeR4\A. 


Thus, f4 is a function defined on all of R4. It agrees with f on A and is @ on 
the complement of A. Note that f may be originally defined on a larger set than 
A or it may be defined just on A. In the definition of f4, it doesn’t matter. 


Example 10.3.3. Let A = D,(@,@) in R®. Find f4 and 9, if f is defined on R? 
by f(a, y) = 22 + y? and g is defined on A by g(x,y) = 1-2? — x2. 


Solution: From the above definition, we have 


fatty? if (xy) € Di(®), 
sod = fe if (2.9) Dil) 


290 10. Integratien in Several Variables 


and 


0 if (w,y) € Di(0). 


Note that here f is defined originally on all of R while g is defined only on A. 


Vl-«*?-y? if (x,y) € D;(0), 
ga(x,y) = 


Definition 10.3.4. With A, f, and f4 as in the preceding definition, let R be an 
aligned rectangle containing A. If f4 is integrable on R, we say f is integrable on 
A and we write 


: f(a)aV (2) = I fala)aV (2). 


Implicit in the above definition is the assumption that fp, fa(x)dV(e) does not 
depend on which rectangle R is chosen, as long as it contains A. We leave the proof 
of this to the exercises. 


If A happens to be an aligned rectangle, then one choice for R in the above 
definition is R= A. Then f = f4 on the rectangle R and 


| F(a)aV (a yk fa(a)aV 


where, on the right, the integral over R is the one defined in Section 10.1, while the 


f(x)dV (a), 
JR 


one on the left is our new definition of the integral over a Jordan region. Fortunately, 
the two agree. 


Existence of the Integral over a Jordan Region. 


Theorem 10.3.5. Let A be a Jordan region and let f be a bounded function defined 
on A. If the set E. of points of A at which f is not continuous is a set of volume 0, 
then f is integrable on A. 


Proof. Since both F and QA are sets of volume 0, their union F = EU@A is also. 
We choose an aligned rectangle R such that A C R. Then f4 is continuous on 
R\ F. It follows from Theorem 10.3.1 that 4 is integrable on R and, by definition, 


f is integrable on A. a 


Properties of the Integral. The following theorem is the extension to Jordan 
regions of the result of Exercise 10.1.6. Its proof is left to the exercises. 


Theorem 10.3.6. [f A is a Jordan region and f and g are integrable functions on 
A, then fg is also integrable on A. 


Example 10.3.7. Prove that if B C A and if A and B are Jordan regions, then 
each function f which is integrable on A is also integrable on B. 


Solution: This follows immediately from the preceding theorem and the ob- 
servation that fg = Xe/fa- 


The next three theorems follow from Theorems 10.1.11, 10.1.10, and 10.3.6 
and some observations about the passage from f to f,4. We leave the details to the 
exercises. 


10.3. The Integral ever a Jerdan Regien 291 


Theorem 10.3.8. If A is a Jordan region and f and g are integrable functions on 
A with f(a) < g(x) for all « € A, then 


[teoave)< [aeavea. 


Parts (b) and (c) of the next theorem mean that the integral over A is a linear 
transformation. 


Theorem 10.3.9. Let A be a Jordan region, let f and g be integrable functions on 
A, and let ¢ be a scalar constant. Then f +g and cf are integrable on A, and 


a) fy ldV(x) =V(A); 
(b) ie ))aV (ae hee )aV (x d+ f g(x)aV (x); 


(c) [to f(a) dV (a poe | He )aV (x 


Theorem 10.3.10. Let A and B be Jordan regions with V(AN B) =0 and let f 
be a bounded function on AUB. Then f is integrable on A and on B if and only 
if it is integrable on AUB. In this case, 


Integral of a Sequence. 


Theorem 10.3.11. Let A be a Jordan region and let {f,} be a sequence of inte- 
grable functions on A. If {fn} converges uniformly on A to a function f, then f is 


integrable and 
lim ts frlw)dV ( = fs) fla)aV (x 
inatos 


Proof. We prove this first in the case where A is an aligned rectangle R. 
Given ¢ > 0, there is an N such that [f(«2) — fa(«)| < ¢/V(A) whenever x € R 
and n> N. This means that, for n > N, 


‘ 
inl) — ey <S@) < inl) + eR 
for all « € R. By Theorem 10.1.11 this implies that 


[gad -avinpavee) < [se v)dV (x 


<f se) ave ie ) + €/V(R))aV (er). 


Since fy, and the constant ¢/(2V(R)) are integrable, their upper and lower integrals 
are the same and are equal to their integrals. Thus, 


[ae z)dV (cx) ef f(x)dV (x sf seavee )dV( < f tule) a)dV(« 


292 10. Integratien in Several Variables 


Since € is an arbitrary positive number, we conclude that 


[ forave) =f sey )aV (ae 


and, hence, that f is integrable on R. These inequalities also show that 


lf, fn(a)dV (x )- [46 )dV (x) 


Thus, lim fp fn(w)dV (2) = fp f(a)dV (w 

Now if A is not an ae Oe we simply choose an aligned rectangle R 
which contains A and replace f and f, by f.4 and (f,)4 in the above argument. 
We note that {(fn)a} converges uniformly to f4 on R if {fn} converges uniformly 
to f on A. The conclusion is that f4 is integrable on R and 


lim f (fa) ala)dV (x =f fala)aV (x 


This implies that f is integrable on A and 
= f Hevav(a). = 
A 


lim 4) fn(a)dV' 
A 


Example 10.3.12. Show that if f is a bounded function on a Jordan region A and 
if {a € A: f(x) <r} is a Jordan region for each r € R, then f is integrable on A. 

Solution: Since f is bounded, there is an M > 0 such that —M < f(x) <M 
for all x € A. We set 


<e whenever n> N. 


g(x) = ae sothat f(x) =2Mg(x) — M. 


The function g also satifies the hypothesis of the theorem, and 0 < g(x) < 1 for all 
ax € A. We will show that g is integrable. This clearly implies that f is integrable. 


We will show that g is integrable by expressing it as a uniform limit of a 
sequence of integrable functions. This sequence is constructed as follows. For each 
positive integer n and each positive integer k < n, we set 
E(n,k) = {e € A: (k—1)/n < f(x) < k/n} 
={w EA: f(x) <k/n}\ {aw € A: fle) < (k= 1)/n}. 
By hypothesis, B(n,k) is a Jordan region and so x p(n,4) 8 integrable. Also, for 
each n, A=Uf_, B(n,k). We define an integrable function g, on A by 


k-1 
Gn(x) = Se XB(n,k)+ 


That is, 


n(x) = Z if «€ E(n,k). 


n 
Since gy is a linear combination of integrable functions, it is integrable. Also 


@> g(x) — gn(x) < k/n—(k-1)/n=1/n if x € E(n,k). 


Since every « € A is in E(n,k) for some k, we conclude that 


lo(x) —Gn(x)|<1/n forall xe A. 


10.3. The Integral ever a Jerdan Regien 293 


This implies that {g,} converges uniformly to g on A. By the previous theorem, g 
is integrable on A. Hence, f is integrable on A. 


10. 


11. 


12. 


13. 


. Prove Theorem 10. 
. Prove Theorem 10. 


. Prove Theorem 10. 


_—————————— = 
Exercise Set 10.3 


. Prove that the integral Jp fa(«)dV(«) that appears in Definition 10.3.4 does 


not depend on the choice of R as long as R contains A. 


. Prove Theorem 10.3.6. You may use the result of Exercise 10.1.6. 


10. 


. Prove that if A and B are Jordan regions with B C A and if f is a non-negative 


integrable function on A, then fy f(x)dV (x) < J, f(w)dV(c). 


. Prove that if f is an integrable function on a Jordan region A, then || is 


integrable and 


ek f(x)aV(e) 


< i [f(e)|aV (2). 


. Let A be a Jordan region and let f be an integrable function on A. For each 


a € A define f(a) and f~(«) by 
f* (x) =max{ f(x),0} and f~ (x) = max{—f(2),0} = (-f(«))*- 


Prove that f+ and f~ are non-negative functions on A with f = f* — f~ and 
\f|=f* + f7. Then prove that f* and f~ are integrable. 


. Prove that if f is a bounded function on a set A of volume 0, then f is integrable 


on Aand fy f(x)dV(x) =0. 

This exercise is preparation for doing the next two exercises. Let U be an open 
Jordan region. Show that for each ¢ > 0 there is a compact set K which is a 
finite union of aligned rectangles contained in U and which has the property 
that U \ K has volume less than €. Hint: See Theorem 10.2.9. 

Use the results of the preceding exercise and Exercise 7.4.1 to prove the follow- 
ing: let U be an open Jordan region and let {Kn} be an increasing sequence of 
compact Jordan subsets of U such that U = U,, Ke. Then, for each integrable 
function f on U, 


f(x) dV(x) = lim f (2) da. 
U n SKy 


Show that every open Jordan region is the union of the interiors of an increasing 
sequence of compact Jordan subsets of U, as in the previous exercise. 


Let A be a Jordan region and let f be an integrable function on A. The average 
value of f on A is defined to be the number 


1 F(a 
avg(f,A) = mm [so (2). 


294 10. Integratien in Several Variables 


If A is compact and connected and if f is continuous on A, prove that there is 
a point xo € A at which f(.0) = avg(f, A). 


14. Suppose A is a Jordan region in R@ and g, is an integrable function on A for 
k = 1,2,.... Prove that if 


2) = alo), 
k=1 


where this series converges uniformly on A, then g is integrable and 


f acoav => [atenv ). 


15. Prove that the function g on R®, defined by 


1 
=F gz sin(ker) sin(ky), 
k=) 


is integrable on any Jordan region in R2. 


10.4. Iterated Integrals 


Integrals of functions of a single variable may be calculated exactly in a wide range 
of situations. The theorem that makes this possible is the Fundamental Theorem 
of Calculus. We calculate an integral by finding (if we can) an antiderivative for 
the integrand, then evaluating at the endpoints and subtracting. Fortunately, there 
is a theorem which often makes it possible to use this same procedure to compute 
integrals in several variables. This theorem is Fubini’s Theorem, and it tells us that, 
in many situations, we may calculate an integral in several variables by integrating 
with respect to one variable at a time. 


An Additivity Lemma. We begin our discussion of Fubini’s Theorem with a 
lemma that will play an important role in the proof. 


Theorem 10.3.1 says that if A and B are Jordan regions with V(AN B) =@ 
then the integral of an integrable function over AU B is the sum of the integrals of 
the function over A and over B. If f is not integrable, only bounded, the analogous 
result holds for the upper integral of f and for the lower integral of f. We will only 
need the following special case of this result. 


Lemma 10.4.1. Suppose R = [a;, bi] x [a2, be] x-++ x [aa, ba] is an aligned rectangle 
in R@ and f is a bounded function on R. Suppose that R= R!UR", where R! and 
R" are obtained from R by dividing one of the intervals [a;,bj] into two adjacent 
subintervals [a;,c], [c,bj] and leaving the other intervals alone. Then 


i f(a)dV(x) = f f(a)dV(x) + | f(a)aV(c), 
vo as * 


Lire 


and 


10.4. Iterated Integrals 295 


Proof. The proof of this is similar to the proof of the Interval Additivity Theorem 
for the single variable integral (Theorem 5.2.8). 


The proof is the same no matter which of the intervals [aj,bj] is divided at c, 
so we may as well assume that it is the first one. If we write R = [a1,b1] x S, where 
S = ap, bo] x +++ x (ag, ba]. Then R’ = [a,c] x S and R” = [c,by] x S. 

A partition of R! is determined by a partition T’ of [a;,c] and a partition Q’ of 
S, while a partition of R’ is determined by a partition T” of [c, b;] and a partition 
Q" of S. That is, partitions of R! have the form T’ x Q! and partitions of R” have 
the form T” x Q”. 


We may replace Q’ and Q” by a common refinement Q. Then P! = T’ x Q 
and P” =T" x Q are refinements of our original partitions of R’ and R”. If we let 
T =T'UT", then T is a partition of [a,b;] which, together with the partition Q of 
S, determines a partition P = T x Q of R = R'UR". Furthermore, every partition 
of R has a refinement which is of this form (just add the point c to the partition of 
the first interval [a;,;]). We will say that a triple (P, P’, P”) of partitions of this 
form is a compatible triple of partitions. The important point about a compatible 
triple (P, P’, P") of partitions is that the set of subrectangles for the partition P 
is the disjoint union of the set of subrectangles for the partition P’ and the set of 
subrectangles for the partition P”. This implies that 


(10.4.1) L(f,P)=L(f,P'!)+L(f,P”) and U(f,P) =U(f,P’)+U(f, P”). 


Since each lower integral is the supremum of the corresponding lower sums, we 
may choose sequences of partitions {P,}, {P!}, and {P"} of R, R', and R" such 
that each of the sequences of lower sums L(Pn, f),L(P), f); and L(P", f) is an 
increasing sequence converging to the corresponding lower integral. Since replacing 
a partition by a refinement results in a lower sum which is at least as large as the 
original, the partitions P,, P%, and P!” may be chosen so as to form a compatible 


triple of partitions. Then, for each n, (10.4.1) holds with P,P’, and P" replaced 
by Py, Pl, and P”, On passing to the limit, this implies that 


| f(x)dV(a)= | fla)aV(x) + | f(a)dV (a). 
“AR AR LR 


This proves the lemma in the case of lower integrals. The upper integral case has 
the same proof. a 


This leads directly to the following lemma. 


Lemma 10.4.2. Let R be an aligned rectangle in R¢ and let f be a bounded func- 
tion on R. If a certain partition of R determines the collection of subrectangles 
{Ri,Ro,..., Rn} of R, then 


and 


296 10. Integratien in Several Variables 


Proof. A partition of R = {a1,bi] [a2, bo] x +» [ag, ba] is obtained by introducing 
a number of partition points into each of the intervals aj, bj]. If we introduce these 
one at a time, we produce a sequence {Py} of subdivisions of R such that, for each 
n, Py1 is constructed from P,, by introducing a point which divides some of the 
subintervals for P,, in the fashion of the preceding lemma. Thus, the lemma follows 
from the preceding lemma and an induction argument on the number of partition 
points that are introduced. a 


Fubini’s Theorem. Let S be an aligned rectangle in R? and let T be an aligned 
rectangle in R%, Let f be a bounded function on the aligned rectangle R= S xT 
in R?+7, We will denote the typical point of Sx T by («,y) where x € S and y € T. 
If we hold « € S fixed and consider f(z,y) as a function of y € T, then this 
function may or may not be integrable on T. In general, it will be integrable for 
some values of « and not for others. However, the upper and lower integrals of this 
function of y exist for all x and yield new functions of x on S which also have upper 
and lower integrals. The key step in the proof of Fubini’s Theorem is the following 
theorem which relates these to the upper and lower integrals of f over S x T. 


Theorem 10.4.3. With S,T, and f as above, 


y Fla, y)aV (2,4) < f / fla, y)dV (y)aV (2) 
Goaz) £7 ee dele . 
S(e,y)aV(y)aV (ew) < | Fle, y)dV (x,y). 
f 


SST 


Proof. The typical partition of S x T has the form P x Q, where P is a partition 
of S and Q is a partition of T. Recall that a partition of S consists of a partition of 
each of the intervals whose Cartesian product is $, while a partition of T consists of 
a partition of each of the intervals whose Cartesian product is T. Taken together, 
these partitions yield partitions of each of the intervals whose product is S x T. It 
is this partition of S x T that we denote by P x Q. 


Let {S;}%, be a list of the subrectangles of S determined by the partition P 
and let {Tj}, be a list of the subrectangles of T determined by the partition Q. 
Then {S; x T)}", is a list of the subrectangles for the partition P x Q. Let 


My = sup f and my = inf f. 
SixT; SuxT; 


Then, for « € S;, Theorem 10.1.11 implies 


myV(T)) < f fewaV(u) S$ /  fewa¥ (a) S MVD). 


Applying Theorem 10.1.11 again, in the variable «, implies 


mV (Si)V (Tj) < f(x, y)dV (y)dV (x) 
J J i] i y y 


f(x, y)dV (y)dV (x) < MyjV(S;)V(T})- 


10.4. Iterated Integrals 297 


If we sum this inequality over i and j, note that V(S;)V(T)) = V(S; x T)), and 
apply the preceding lemma, the result is 


LULPxQ)< / / F(x, y)aV(y)aV (2) 
cao al 0 


< | f / Seema ave) < UU P x Q). 


Since the two expressions in the middle of this inequality give an upper bound 
for {L(f, P x Q)} and a lower bound for {U(f, P x Q)} and since the least upper 
bound for {L(f,P x Q)} is [f(a y)dV (x,y) and the greatest lower bound for 


{U(f,P x Q)} is Top f (es y)dV(a, y), we conclude that (10.4.2) holds. a 
In the case where f is integrable on S x T, this yields Fubini’s Theorem: 


Theorem 10.4.4. Let S and T be aligned rectangles in R? and R, respectively, 
and let f be an integrable function on S x T. Then 


f(x, y)aV (x, y) 
SxT 


2 / ; / _Fles)aV (ohaV (= ff feomavenave 


Furthermore, if f(«,y) is an integrable function of y on T for each fixed x € S, 
then J» f(«,y)dV(y) is an integrable function of « on S, and 


(10.4.4) f _ fe udav vw) = i [ F(x, y)dV (y)aV (2). 


(10.4.3) 


Proof. If f is integrable on S xT, then the first and last expressions in the string of 
inequalities (10.4.2) are equal. Hence, each of the inequalities in (1@.4.2) is actually 
an equality in this case. This proves (10.4.3). 

If f(x,y) is an integrable function of y on T for each « € S, then 


[fener = | senav = [ Hewave) 


for each x € S. Then (10.4.3) implies that 


fi: [rena )dV (y)dV (x =f, fa x, y)dV(y)dV (2), 


which means that J, f(x,y)dV(y) is an integrable function of z. Then (10.4.3) 
implies (10.4.4). a 


Remark 10.4.5. In (10.4.3) there is nothing special about the order in which the 
iterated integrals are taken. The theorem is equally valid if we integrate first with 
respect to « and then with respect to y. Of course, for the analogue of (10.4.4) to 
be valid with the order of integration reversed, we must assume that f(x,y) is an 
integrable function of « for each fixed y. 


This leads to the following consequence of Fubini’s Theorem. 


298 10. Integratien in Several Variables 


Theorem 10.4.6. Let S and T be aligned rectangles in R? and R4, respectively, 
and let f(x,y) be an integrable function on S x T which is also integrable as a 
function of x for each fixed y and integrable as a function of y for each fixed x. 
Then [., f(x,y)dV (x) is an integrable function of y onT and fp f(x,y)dV(y) is an 
integrable function of x on S, and 


f(a, y)adV (wy) 
(10.4.5) x? 


; =f [tewavnavee = ri [ F(e, y)dV (x) dV (y). 


Note that the integrability conditions in this theorem will all be satisfied if f 
is a continuous function on the rectangle S x T. 


The ability to reverse the order of integration in an iterated integral is a real 
advantage, as the following example shows. 


Lope 
Example 10.4.7. Find [ | y® sin(xy*) dyde. 
e 0 


Solution: Computing the in 
the order of integration, the inside integral is just. fe: ysin(xy*)dx = y — ycos(y?) 
and the iterated integral becomes 


ide integral looks difficult. However, if we reverse 


pat Lie 
[ I y® sin(ay?) dedy = [ (y — ycos(y2))dy = 1/2. 
e e 0 


Iterated Integrals over Non-rectangular Regions. A great advantage of inte- 
grals in one real variable is that we can often use the Fundamental Theorem of 
Calculus to calculate them. In order to take advantage of this, we would like to 
interpret an integral over a Jordan region A in R® as the result of repeated appli- 
cations of integration in one variable. Fubini’s Theorem is the tool which allows us 
to do this. 


The issue is complicated by the fact that we wish to integrate over a Jordan 
region, rather than over a rectangle. To do this, we replace the function f to be 
integrated with f.4, where f is an integrable function on A (then 4 is an integrable 
function on any aligned rectangle containing A). We then attempt to apply Fubini’s 
Theorem repeatedly to express the integral of f4 over a rectangle containing A as 
the result of a succession of single variable integrations. In order for this to work, 
A must have a special form. 

We begin with a result which is a direct application of Fubini’s Theorem. It 
will form the basis for the induction argument in the proof of our main theorem. 
It concerns the case of an integral over a compact Jordan region A C R**1, which 
is constructed as follows: suppose there is a compact Jordan region B C R® such 
that A has the form 


A={ (x,t): 2 € Band U(x) <t< d(e)}, 


10.4. Iterated Integrals 299 


where 1 and @ are continuous functions on B. In this case, fa(x,t) = 0 if a ¢ B 
or if ¢ ¢ [v(a), @(x)]. Then (10.4.4) implies 


Theorem 10.4.8. With A, B, w, and @ as above and with f an integrable function 


on A, 
O(a) 

a,t)dV(a,) = a, t) dtdV (x), 

fie (x,t) Li f(a, t) dtdV (x) 


provided f(c,t) is an integrable function of t on (w(x), («)| for each x € B. 


If we write 


g(x) 


O(a) 
: fla,t) dt, 
wl 


(«) 
then the previous theorem reduces the problem of computing J), f(x, t)dV(«,#) to 
the problem of computing the lower-dimensional integral f,, g(«)dV (x). This is the 
basis for the induction argument in the proof of Theorem 10.4.10. Before we state 
and prove that theorem, we need the following technical result. 


Theorem 10.4.9. Let A, B, 3, ¢, and f be as in the previous theorem. If f is 
continuous on A, then the function 


is continuous on B. 


Proof. Since A is compact and f continuous on A, |f| has a maximum on A. Let 
M, be a positive number greater than or equal to this maximum. 

Since y and ¢ are continuous on B and y(:r) < $(), the non-negative function 
@— Wis also continuous and, hence, has a maximum. Let Mo be a positive number 
greater than or equal to this maximum. 

Let zo be a point of B. We will prove that g is continuous at xo. We need to 
consider two cases: (1) $(2o) — U(xo) = 0 and (2) (ao) — w(ao) > 0. 

In case (1), g(x) = 0. Furthermore, the continuity of ¢—w implies that, given 
> 0, there is a 6 > 0 such that 


(a) — (x) < ve whenever |r — xe|| <6. 
My 


Then, 


O(a) 
lal) — 9(¢o)| = [a()] = ie » Leal] < Mi(ole) ~ We) <« 


This completes the proof in case (1). 
In case (2), we have (a9) — (ao) > 0. Given ¢ > 0, we may choose a positive 
number p such that 
€ 


P< 5(0lt0) ~ V(#0)) and p< Ter. 


10. Integratien in Several Variables 


We then set a = (ro) +p and b = ¢(20) — p. Since and ¢ are continuous at ro, 
there is a d > @ such that 


[b(x) —¥(xo)| <p and |¢( ¢(xo)| <p, 


whenever « € B and ||x — xo|| < 6. For each such 2, we have 


W(x) <a<b< d(x). 


Also, each of the intervals [1(c), a] and (b, $()] has length less than 2p, and so the 
sum of their lengths is less than 4p. 


Since f is continuous on the compact set A, it is uniformly continuous on A. 
Hence, we may choose 5 small enough that it is also true that 


[FGeasta) = f(ea.ta)] < ay 


whenever (1,1) and (cr, tg) are in A and ||(«1,t1) — (2, t2)|| < 6. In particular, 


whenever ||x — xo|| < 6, 


Ife.) - aE | 


provided that (a,t) and (o,¢) are both in A. Then, 


o(x) (v0) 
lolz) — 9(¢o)| = [ [eae i “oy {ema 
(ae) 
< re flea f° fan + [ie0- seeo.eya 


(x0) 
“hae tat | f (wo, t) dt 
(x0) 


<4pM, + aK Mado = 6 


This completes the proof in case (2). a 


We can now state and prove the form of Fubini’s Theorem which represents an 
integral over a Jordan region as the result of repeated single variable integrations. 


Theorem 10.4.10. Suppose f is an integrable function on the closed Jordan region 
A. Suppose also that A is the set of x = (x1,...,ta) € R% which satisfy the 
inequalities 


v1 S21 < 1, 
wa(x1) Sx2 < 2(21), 


wa(@1,...,@a-1) Sea < ba(w1,...,@a-1), 


where 1, and $1 are numbers and w;(x1,...,2j~1) and $;(a1,...,@j-1) are con- 
tinuous functions on the set of («1,...,2;-1) which satisfy the inequalities in this 


4. Iterated Integrals 301 


list that precede the jth one. Then 


[ree 


Salgidn foaldiveeacs) 
-{- [ of FSiy cxaa) degedeh, 
Jip Supo(a1) Wal@ty-@a—1) 


provided that each of the successive iterated integrals exists. This condition is sat- 
isfied if f is continuous on A. 


(10.4.6) 


Proof. We prove this by induction on d. If d = 1, then there is nothing to prove, 
since the two sides of (10.4.6) are the same integral over an interval in this case. 


Now suppose the theorem is true in dimension d— 1. To complete the proof, we 
need to prove that it is then true in dimension d. Let A be a Jordan region defined 
by d inequalities as in the hypothesis of the theorem and let f be an integrable 
function on A. Let B be the set defined by the first d—1 of these inequalities. Then 
A, B, and f satisfy the conditions of Theorem 10.4.8. Hence, if x = (@, ca) where 


& = (w1,...,@q-1) and if f(#, xq) is an nee fenshion of xq on [v14(#), ba(&)] for 
each & € B, then this theorem implies that g(% = fee va) ) f(%, xq) dg is integrable 
on B and 


(10.4.7) [re a) dV (a Ls (#24) dradV(&). 


Now the set B and the function g satisfy the conditions of our theorem in dimension 
d—1. Since we are assuming the theorem is true in dimension d— 1, we have 


62 (21) pa(1,..-,2e-2) 
[@ (#)aV (#) =|" | : a O (Fite si tyea) deieitens aes 
B W221) Wa (@1,-.-;8d-2) 


If we combine this with (10.4.7), the result is (10.4.6). 


It remains to prove that each of the successive iterated integrals exists if f is 
continuous on A. However, this also follows from induction on d. It is clearly true 
if d = 1 since a continuous function on an interval is integrable. Assuming it is 
true in dimension d —1, then if f is continous on an A of the form describe in 
the theorem in dimension d, we conclude that f is continuous, hence, integrable in 
its last variable and the function g, defined by integrating in this last variable, is 
continuous on the corresponding set B by Theorem 10.4.9. Since we are assuming 
the result to be true in dimension d—1, we conclude that each of the successive 
iterated integrals of g exists. Hence, the same thing is true of f. a 


Example 10.4.11. Find |), xyz dV (x,y, z) if A is the Jordan region in R® defined 
by the inequalities 0< 2 <1,0<y<a,0<2<1-27. 


10. Integratien in Several Variables 


Solution: According to the previous theorem, 


1 pe pl—w? 
| xyz dV(2,y,2) = [ [ i xyz dzdydr 
A 0 0 0 


3 — 90 + 0") dar 


10. 


Exercise Set 10.4 


. Find the integral over the square [—7, 7] x [—7, 7] of the function g of Exercise 


10.3.15. 


1 1 
. Evaluate | [ aa pape vt. 


. Find the area of the triangle A with vertices at (0,@), (a, @), (a,b) by calculating 


Jy 1 dV (x,y) (use Theorem 10.4.10). 


. Calculate the area of a disc of radius 1 by representing it as the integral of 1 over 


the disc, expressing this integral as an iterated integral, and then evaluating 
this iterated integral. 


ie 
. Interpret the iterated integral | re (a? + y?) dydx as an integral of 2 + y” 
0. Ja? 


over a certain Jordan region in 'R®. This, in turn, is equal to a certain iterated 
integral, first with respect to « and then with respect to y. Describe this integral 
and then evaluate it. 


. Write down an integral in R? which represents the volume of a sphere of radius 


1. Then express this as a triple iterated integral. You do not need to evaluate 
this integral. 


. Find [, cdV(e,y, 2) if A is defined by the inequalities 


O<xr<l, O<y<ax?, O0<2K<rty. 


. Show that if f and g are continuous real-valued functions on a Jordan region 


BCR* and if g(x) < f(x) for all a € B, then the region of Exercise 10.2.12, 
A= {(a,t) ER? : 2 € Band g(x) <t< f(x)}, has volume 


V(A)= : (f(#) = 9) aV(e). 


. Prove that if A is any bounded subset of R? and B is a subset of R? of volume 


©, then Ax B is a subset of R?*4 of volume @. Use this to prove that the 
Cartesian product A x B of two Jordan regions is a Jordan region. 


Use Fubini’s Theorem and the previous exercise to prove that if A C R? and 
BC R® are Jordan regions, then V(A x B) = V(A)V(B). 


0.5. The Change ef Variables Fermula 303 


11. Suppose A is a compact Jordan region in R?, B is a compact subset of R4, 
and f is a continuous function on B x A. Prove that fy f(2,y)dV(y) is a 
continuous function of x on B. Hint: This is similar to but not exactly the 
same as Theorem 10.4.9. 


12. Prove that if f(t,) is a continuous function on J x A, where J is an open 


interval im R and A is a compact Jordan region in R¢, and if =f(t,2) exists 


and is continuous on I x A, then 
d 

> t,x) dV (a 
5 [ seman 


Hint: Fix ¢ and consider the function 


st ifh £0, 
g(h,z) = 2 5 ; 
afta) ifh=0. 


Show that this is a continuous function of (h,«) on J x A for some interval J 
containing @ (the Mean Value Theorem is useful in proving this). Then apply 
the preceding exercise. 


10.5. The Change of Variables Formula 


Recall the substitution formula (Theorem 5.3.6) from Chapter 5: 


b , .9(b) 
[ toos'oa= fi soa 
a g(a) 
Here, if J = [a,b] and J = 9(1), then f is assumed continuous on J and g is assumed 
differentiable with an integrable derivative on J. 


This can be thought of as a change of variables formula, where u = g(t) is 
the transformation from the variable £ to the variable u, and the integral formula 
relates the integral of f as a function of u to an integral involving the composite 
function f 0g as a function of ¢. The formula requires an extra factor g'(t) in the 
integrand of the latter integral. This is related to how the transformation g changes 
lengths. 


In this section we will derive a similar formula for integrals in several variables. 
This formula will tell us how an integral transforms under a smooth transformation 
6 from a Jordan region in R” to another region in R”. In this case, the extra factor 
that is needed is |det dé, where dé is the differential of ¢. 

The proof of the change of variables formula for integrals in several variables 
is quite technica 


An outline of the steps involved is as follows: 


(1) We first show that a linear transformation with matrix A transforms a Jordan 
region into a Jordan region with volume |det A| times the volume of the orig- 
inal region. This involves factoring the matrix into a product of elementary 
matrices and then noting how each of these affects volume. 


304 10. Integratien in Several Variables 


(2) We then prove that the image under a smooth transformation of a rectangle 
is a Jordan region. 

(3) Next, for a smooth transformation ¢ and a small rectangle R in its domain, we 
develop a rather precise estimate of how much the volume of ¢(R) can differ 
from the volume of dé(R). This exploits the fact that dd(a) is the linear part 
of the best affine approximation to ¢ near a. 


(4) To prove our formula for integration over a rectangle, we use an argument 
involving a sequence of nested rectangles to show that if the the two sides of 
our proposed formula are not equal for the original rectangle, then for some 
sufficiently small rectangle in our nested sequence, they will differ by more 
than is possible given the estimate we derived in step (3). 


(5) Finally, it is fairly easy to prove that if the formula holds for integration on 
rectangles, then it also holds for integration on Jordan regions. 


Factorization of Matrices. We begin by studying how a linear transformation 
affects the volume of a Jordan region. The simple way to do this is to factor a 
given linear transformation as a product of elementary linear transformations whose 
effect on volume is easy to determine. Such a factorization is given by the process 
of Gauss elimination (row reduction). The elementary linear transformations in 
this factorization correspond to the elementary matrices as described below. 


The elementary d x d matrices are of three types: 


(1) The interchange matrices Eij. For i # j, the interchange matrix jj is ob- 
tained from the identity matrix by interchanging its ith and jth rows. 


(2) The shear matrices Sj. For i # j the shear matrix Sj; is obtained from the 
identity matrix by adding its jth row to its ith row — that is, by adding a 1 
to the ij position in the identity matrix. 

(3) The scale matrices Ti(a). For i = 1,...,d and a ¢ 0, T;(a) is obtained from 
the identity matrix by multiplying its ith row by the scalar a — that is, it is 
the matrix that has @ in the ith position on the main diagonal, 1 in the other 
positions on the main diagonal, and 0 in all other positions. 


Note that if A is any d x d matrix, then Bij A is the result of interchanging the 
ith and jth rows in A and leaving the other rows unchanged, $j; A is the result. of 
adding the jth row of A to its ith row and leaving all but the ith row unchanged, 
while T;(a) A is the result of multiplying the ith row of A by a and leaving the other 
rows unchanged. 

The process of Gauss elimination is that of successively multiplying a matrix 
A on the left by elementary matrices until what is left is a matrix of reduced row 
echelon form. In the case of a non-singular matrix A its reduced row echelon form 
just the identity matrix. Thus, for each non-singular d x d matrix A there is a 
matrix B which is a product of elementary matrices and satisfies BA = I. Then 


A= Bo, 


Note that the inverse of an elementary matrix is an elementary matrix or a product 
of elementary matrices (Exercise 10.5.1) and so B™! is also a product of elementary 


@.5. The Change ef Variables Fermula 305 


matrices. Thus, we have proved 


Theorem 10.5.1. Each non-singular d x d matrix A is a product of matrices of 
the form Eij, Sij, Ti(a)- 


The determinants of the elementary matrices are easily calculated. 


Theorem 10.5.2. For each i and each j # i we have det Fj; = 1, det Sj; = 1, and 
det T;(a) = a. 


Since the determinant is multiplicative (det AB = det Adet B for all pairs A, 
B of dx d matrices), it follows that the determinant of a given non-singular matrix 
A is just the product of the scale factors a that appear in its factorization as a 
product of elementary matrices. 


Linear Transformations and Volume. The next theorem tells us how the vol- 
ume of a Jordan region is affected by a linear transformation given by an elementary 
matrix. 


Theorem 10.5.3. Each of the elementary matrices takes a Jordan region to a Jor- 
dan region. A shear transformation S;j takes a Jordan region to a Jordan region of 
the same volume, as does an elementary interchange Ej;. The scale transformation 
T;(a) takes a Jordan region to a Jordan region of volume |a| times the volume of 
the original region. 


Proof. Each elementary interchanges £;; takes each aligned rectangle to an aligned 
rectangle of the same volume. Clearly this means that it preserves inner volume, 
outer volume, and volume. Since a bounded region is a Jordan region if and only 
if its boundary has outer volulme @, each Fj; takes a Jordan region to a Jordan 
regions of the same volume. 


The scale matrix T;(a) takes an aligned rectangle to an aligned rectangle which 
has been stretched (or shrunk) by a factor of |a| in one dimension, while its other 
dimensions remain the same. Hence, the image rectangle has volume |a| times the 
volume of the original rectangle. As above, this implies that it takes a Jordan 
region to a Jordan region of volume |a| times the volume of the original. 

The shear matrices $;; also preserve volumes of Jordan regions, but the proof 
of this fact is a little more complicated. 


The shear matrix $j) on R? is the matrix 


ia 

ea): 
It takes the aligned rectangle [a, b] x [c, d], which has vertices (a,c), (b,c), (bd), and 
(a,d), to the parallelogram with vertices (a+c,¢), (b+e,¢), (b+d,d), and (a+d, d). 
This parallelogram has base of length (b + c) — (a +c) = 6—a and height d—c. 


Thus, its area is (b — a)(d—c) (Exercise 10.2.1), which is the same as the volume 
of the original rectangle. 


In general, an aligned rectangle R in R# for d > 2 has the form S x T where 
S is an aligned rectangle in R? and T is an aligned rectangle in R€-2. The shear 
transformation S12 on R® sends this to P x T where, by the above discussion, P is 


306 10. Integratien in Several Variables 


a parallelogram with the same area as S. It follows from this and Exercise 10.4.10 
that $1) sends R to a Jordan region with the same volume as R. Since, for any 
iF J, Siz is just S12 composed with some elementary interchanges, it follows that 
it also takes an aligned rectangle to a Jordan region with the same volume. 


Let A be a Jordan region, let R be an aligned rectangle containing A, and 
let P be a partition of R. Let Ry, Ro,...,Ry be a list of the subrectangles of R 
determined by the partition P. Set 


E=(J{Re: Re C A}, 
Fa {Res R.A ZO}. 


Then U(x.4,P) = V(F) and L(x, P) = V(B). Since A is a Jordan region, given 
> 0, there is a partition P such that V(F) — V(E) < . Of course, regardless of 
how the partition is chosen, 


(10.5.1) V(B) < V(A) < V(F). 


Note that $j; F is the union of those $;;R, such that Ry. A #0, and any two 
of these sets meet (if at all) in a set of volume 0. Since V(Si;Rk) = V(Rx), we 
conclude that 

V(SiyF) = V(F). 
A similar argument shows that 

V(SijE) = V(E). 
Heriee: 
(10.5.2) V(B) = V(SijB) < V(SigA) < V(Si34) < V(SiF) = V(F). 
Since V(F) — V(E) <e, we conclude that 

V(SisA) — V(SizA) <€. 

Since € was arbitrary, this difference is actually 0. This proves that $j; A is a Jordan 
region. That it has the same volume as A follows from (10.5.1) and (10.5.2). O 
Theorem 10.5.4. If L : R¢ — R¢ is a linear transformation and E is a Jordan 
region, then L(E) is also a Jordan region and V(L(E)) = |det L|V(E), where det L 
denotes the determinant of the matrix corresponding to L. 


Proof. We first note that if this theorem is true for linear transformations L, and 
In, then it is also true for the composition L; 0 Lo, by the following computation: 


V(Ly 0 Lo(E)) = [det L1|V(Lo(B)) 
= |det Ly|| det Lo|V(E) = | det Ly Lo|V(E), 
since determinant and absolute value are both multiplicative functions. 


The elementa: 


interchanges Ejj and shear transformations Sij do not affect 
volume and they are matrices of determinant +1. Thus, the theorem is true for 
these linear transformations. 

The scale matrix T;(a) takes each aligned rectangle to an aligned rectangle with 
volume |a| times the volume of the original. Since a = det T;(a), the theorem is 
true for the transformations T;(a). 


0.5. The Change ef Variables Fermula 307 


Since every non-singular dx d matrix is a product of interchanges, shear trans- 
formations, and scale transformations, the theorem is true for all non-singular linear 
functions from R¢ to Ré. 


If L is singular, then its determinant is @. Thus, to finish the proof, we need to 
show that if L is a singular linear transformation, then L(E) = @ for every Jordan 
region F. We leave this as an exercise. a 


Example 10.5.5. If L : R? > R? is the linear transformation with matrix 


(3%) 


what is the area of the image of the unit disc D,(@,@) under the transformation L? 
Solution: The unit disc has area 7. By the previous theorem, its image under 

L has area | det L |r = 2n. 

Example 10.5.6. What is the area of an ellipse with two vertices at distance 3 


@) along the line y nd two vertices at distance 2 from (@,@) along the 
9 


from 
line y 


Solution: This ellipse may be obtained from the unit disc by first applying 


@») 


and then applying the linear transformation which is rotation through the angle 


the transformation with matrix 


a/4. The first transformation has determinant 6, while the second has determinant 
1. Hence the area of the indicated ellipse is 67. 


Smooth Image of a Rectangle. We will prove that, under appropriate condi- 
tions, the image of an aligned rectangle under a smooth map is a Jordan region. 
We first prove that the image of a degenerate rectangle under such a map is a set 
of volume @. 


Theorem 10.5.7. Let ¢ be a one-to-one smooth transformation from an open set 
U CR? to RP and suppose d@(x) is non-singular at each point of U. If R is a 
degenerate aligned rectangle contained in U, then 6(R) is a set of volume @ in R?. 


Proof. Since R is degenerate, it is a rectangle of dimension at most p— 1. We 
may (a1,.--;2») 1 2p 
Let a be a point of R. We will show first that there is a neighborhood of b = (a) 
whose intersection with ¢(R) has volume @. If we can do this for each a € R, then, 
since 4(R) is compact, we may cover 4(R) with finitely many open sets whose 
intersections with @(R) have volume @. It follows from this that @(R) itself has 


well assume that it is contained in R?-! = {4 


volume @. 


Since translations do not affect volume, we may as well assume that a and 6 = 
(a) are both equal to @. Also, since applying a non-singular linear transformation 
does not affect whether or not a set has volume @, we may replace ¢ by (d¢(@)) 14. 
In other words, we may as well assume that dd(@) = J - the identity transformation. 

If 6 = (¢1,--+,@) and points of R? are denoted by (x,y) with « € R?~! and 
y € R, then we define g : UM R?-! — R?-! by 


~ 


10. Integratien in Several Variables 


Then dg(0) is the upper left (p— 1) x (p— 1) subdeterminant of dé(0) and so it too 
is the identity transformation. The Inverse Function Theorem then implies that 
there are neighborhoods V and W of @ in R?-! such that g maps V onto W and 
has a smooth inverse g~! : W -+ V. Then 

(9 (x), 0) = (x, dog *(2)) 
for x € W. That is, the part of @(R) consisting of points with first coordinate in WV 


is the graph of the smooth function $,0g~1. It therefore has volume @ by Example 
10.2.11. This completes the proof. a 


Theorem 10.5.8. Let ¢:U -+ R? satisfy the conditions of the previous theorem. 
If R is a rectangle in U, then @(R) is a Jordan region. 


Proof. If R is a rectangle in U, then its boundary is a union of finitely many 
rectangles of dimension p — 1 ~ that is, it is the union of finitely many degenerate 
rectangles. The image of each of these under ¢ has volume @ by the previous 
theorem. Hence, (9R) has volume zero. The proof will be complete if we can 
show that 04(R) = g(OR). 

The image of ¢ is an open set V by Exercise 9.6.8, and ¢ : U -+ V is one-to-one 
and onto. Thus, ¢ has an inverse transformation ¢~! : V 
transformation, by the Inverse Function Theorem. It is, in particular, continuous. 
Since both ¢ and $7! are continuous, a subset A C U is open if and only if its 
image $(A) C V is open. It follows that ¢ takes the interior of R to the interior of 
¢(R) and, hence, the boundary of R to the boundary of $(R). Qo 


-+ U which is a smooth 


Integral over the Smooth Image of a Rectangle. Our next objective is to 

prove the change of variables formula for integration over a rectangle. We will 
need the following lemma, which says that the relative error in approximating the 
volume of the image of a rectangle under a smooth map by the volume of its image 
under the differential of the map can be made arbitrarily small. In the lemma, it 
is crucial that we don’t allow rectangles R to become too skinny. By this, we mean 
that we don’t want the ratio of the length of the shortest edge of R to the diameter 
of R (greatest distance between two points of R) to be too small. We will call this 
ratio the aspect ratio of the rectangle. A rectangle with all edges equal in length 
has the largest possible aspect ratio, Vd. 
Lemma 10.5.9. Let \ and K be positive constants, with \< Vd. Let U be an open 
subset of RP and let @:U -+ RP be a smooth one-to-one transformation. Suppose 
dg(a) is non-singular and |det dd(a)| < K for alla € U. Then, given « > @, there 
is ad > @ such that if R is a rectangle in U with diameter less than 5 and aspect 
ratio at least X, then |V(@(R)) — V(dd(a)R)| < «V(R), where a is the center of the 
rectangle R. 


Proof. Let R be a rectangle in U with diameter less than a positive number 6 to 
be determined below and aspect ratio at least A. Note that $(R) is a Jordan region 
by the previous theorem and, hence, it has volume. 

Since translation does not affect volume, we may assume that the center of the 
rectangle R is @ and that ¢(@) = @. By hypothesis 


(10.5.3) [det do(®)| < K. 


10.5. The Change ef Variables Fermula 309 


If@ <p <1, then (1+ p)R is the rectangle created from R by expanding each 
edge in a symmetric way about its center by the factor (1 +p). Similarly, (1 — p)R 
is the rectangle created from R by shrinking each edge in a symmetric way about 
its center by the factor 1 — p. Also, 


(l-p)RCRC (1+ p)R, 


and, since dé(0) is linear, 
(1 = p)dd(0)R C dg(O)R C (1+ p)dg(0)R. 
Comparing volumes and using (10.5.3) yields 
V((1 + p)dg(0)R) — V((1 — p)do(0)R) 

= ((1+p)* = (1 = p)*)V(do(@)R) 
(10.5.4) = ((1 +p)" — (1 = p)*)] det dd(0)[V(R) 

< 2pd(1 + p)*!| det dg(0)|V(R) 

< 2% pd V(R). 


Tf we choose 
€ 


0 dK’ 
then it follows from (10.5.4) that 
V((1 + p)do(@)R) — V((1 — p)db(@)R) < eV(R). 
The proof will be complete if we can show that, for small enough 0, any rectangle 
R containing @, of diameter less than 4, satisfies 
(10.5.5) (1 = p)dg(0)R C 4(R) C (1+ p)d9(0)R, 
since these containments are also satisfied with ¢(R) replaced by d¢(0)R. 
If « is any non-zero vector in R¢, then 
[ell = [I(4e(@)) *ao(@)x|| < |\(49(@) "||| |4e)2I1. 
Thus, ||d9(0)c|| > ||(¢4(0)) "||! |||. In other words, if L is any line segment in 
R¢, then the length of the line segment dd(0)L is at least the factor 
A= |((49(0) 4) 7 
times the length of L. It follows that the distance from d¢(0)R to the complement 
of (1+ p)dd(0)R is at least Apr, where r is one half the length of the shortest edge 
of R. By the definition of the differential d¢(0), we may choose 6 such that ||«|| <5 
and « € R implies 
||$(@) — d(O)e|) < Apal|e|| < Apr. 
This implies that (x) € (1+ p)dg(0)R. A similar argument shows that, with 5 
chosen as above, « € R implies that (1 — p)d¢(0)x € G(R). Hence, (10.5.5) holds if 
R has diameter less than 6. This completes the proof. a 
Theorem 10.5.10. Let U be an open subset of R? and let ¢ : U + RP be a smooth 
one-to-one transformation with dé non-singular at each point of U. Let R be an 
aligned rectangle in U and let f be a continuous function on ¢(R). Then 


f(u)dV(u) = J oe) laevas(e)|av(e. 


o(R) 


31e 10. Integratien in Several Variables 


Proof. For each subrectangle S of R we set 


as) =f oo foavey- [seen laccagm|avee) 


9) = sy" 
To prove the theorem, we need to show that A(R) = @. This is equivalent to 


showing that Q(R) = 6. 

Let h be the diameter of R. We will choose inductively a downwardly nested 
sequence {Ri}%» of subrectangles of R in such a way that Ri has diameter h/2! 
and |Q(Ri)| > |Q(R)|. We begin by setting Ro = R. 

Suppose Ro,..., Rm have been chosen in such a way that the conditions of the 
previous paragraph are met. If Rim = (a1,b1] x ++ x (ap, bp], we partition Rm by 
partitioning each interval [ax, b,] into two subintervals of equal length. There are 
9P subrectangles of Rin for this partition and each of them has diameter h/2”*1 
since Rm has diameter h/2”. If {$1,...,S,} is a list of these subrectangles of Rin, 
then Rm =U, S; and 


For at least one of the rectangles $;, we must have |Q(Sj)| > |Q(Rm)|, for if 
1Q(5))| < |Q(Rin)| for all j, then 


A(Rin) = > Q(Sj)V(S;) < S> Q(Rn)V (Sj) = Q(Rin)V (Rin) = A(Rin)s 
j=l J=1 


which is impossible. Thus, for some j, we have |Q(Sj)| > [Q(Rm)|. We choose 
Rin41 to be an Sj which satisfies this inequality. This proves by induction that a 
sequence {Ri} with the required properties can be chosen. 

Since the sequence {R;} is a downwardly nested sequence of compact sets, it 
has a non-empty intersection, Let a be a point in this intersection. 

Since ¢ is smooth, we may choose a neighborhood V of a in which |det dé(x)| 
is bounded above by a positive constant K. If A is the aspect ratio of R, then 
each of the rectangles R; has the same aspect ratio. By the previous lemma, there 
is a } > @ such that each rectangle R in V with aspect ratio at least \ and with 
diameter less than 6 and with center b satisfies 


(10.5.6) |V(9(R)) — V (dd(0)R)| < eV(R). 
Since V(dd(b)R) = | det 6(b)|V(R), this implies 
(10.5.7) V(d(R)) < (|det 4(b)| + )V(R). 


These conditions will be met for all R; with R; C Bs(a). We will denote the 
center of R, by a). If we also choose 6 small enough that 


[f(o(x)) — F(¢ly)) |< and |f(¢(x))| det do(x)| — f((y))| det do(y) || < 


10.5. The Change ef Variables Fermula 311 


for all x,y € Bs(a), then 


A(R) = 


| f(u)dV(w )- fo ) [det dp(a) |aV (a) 
HRS 


<f If (uw) — #(6(a3))| av (u) 
(Rj) 


+ [gyfer Ba F(6(a;)) [det dd(a;)|aV (x) 
ae |f(6(aj)) | det d(aj)| — f(G(@)) | det dd(x)]|aV (x 
S V(9(Ry)) + |F(O(a5))| |V(( Ry) — V(dd(a;) Ry) + V(R5). 


If we apply (10.5.6) and fe with b = aj and R = Rj, to this inequality, we 
conclude that 

|A(Ry)] S V(Ry)(F(G(a3))| + | det do(aj)| + € + 1). 
Since € was arbitrary and ¢(a;) + 4(a) and d¢(a;) + d¢(a) as j + 00, this implies 
that Q(Rj) = A(Rj)/V(Rj) can be made smaller than any positive number by 
choosing j large enough. Since Q(R) < Q(R;) for all j, this implies that Q(R) = @, 
as required. a 


This has the following corollary, the proof of which is left to the exercises. 


Corollary 10.5.11. Let U be an open subset of R¢ and let ¢:U + R4 be a smooth 
one-to-one transformation with non-singular differential on U. If R is an aligned 
rectangle in U, then 


V(6(R)) = [ [det dé(x)| dV (w). 
JR 
Furthermore, if M = supp|det dé| and m = inf |detdé|, then 
mV(R) < V(4(R)) < MV(R). 


Integral over the Smooth Image of a Jordan Region. We can now prove the 
general change of variables formula. The proof uses the following lemma, which 
follows easily from the previous corollary. The proof is left to the exercises. 


Lemma 10.5.12. If ¢: U — R® is a smooth one-to-one function with dd non- 
singular on U and if K CU is a compact set of volume @, then $(K) is also a set 
of volume @. 


Theorem 10.5.13. Let A be a compact Jordan region contained in an open set 
UCR. Let ¢: U + R¢ be a smooth one-to-one function with a differential 
which is non-singular on A, and let f be a function which is bounded on $(A) and 
continuous except on a subset E of @(A) of volume @. Then $(A) is a Jordan 
region, f is integrable on (A), f o¢@ is integrable on A, and 


f(u) dV(u =f Hoe)! )) |det do(x) |dV (x). 


o(a) 


312 10. Integratien in Several Variables 


Proof. Let V = ¢(U). By the Inverse Function Theorem, V is an open set and 
@+:V -+U is a smooth function with non-singular differential. 


The boundary of A is a set of volume @ since A is a Jordan region. Since ¢ and 
6”! are both continuous, 04(A) = 4(0A). It follows from the previous lemma that 
(A) is also a set of volume @ and, hence, that (A) is a Jordan region. Hence, 
we may extend f to be @ on the complement of ¢(A) in V and it will still be a 
function which is continuous except on a set of volume @. It follows from Theorem 
10.3.5 that f is integrable on ¢(A). 

Let A’ be the closure of 2¢(A)UB. Then f, extended to be @ on the complement 
of $(A), is continuous on the complement of A’. The set A’ has volume @. Hence, 
by the previous lemma, ¢~!(A’) is a set of volume @. Since fo ¢ is continuous on 
U except at points of 6 !(K), it follows that fo ¢ is integrable on A. 

Let € be any positive number. Let R be a rectangle containing A and let P be 
a partition of R. We choose P so that Ry, R2,...,Rn is a list of those rectangles 
for this partition which are contained in U. If the partition is fine enough, then it 
will be true that A CU, Rj. Also, the partition may be chosen fine enough that, 
if S is the set of j for which Rj OK #0, then 


SOV(R)) <« 


jes 


If KR; = 9, then either AN R; = 0 or R; is a rectangle contained in the interior 
of A and f is continuous on ¢(R;). If the latter is true, then 


[ flu) dV(u) = f f(d(a))] det do(x)|aV(2). 
(Ry JR; 


Since f is @ on the complement of (A), we have 


(u) dV (u )- f Hoe 1))| det d(e)| dV (« 


ae 


(u) aV(w ae F(6(@))| det 6(e) dV (ew ») 


Si (Rs) 
De | aa (u) dV(w ifs f(o(2))| det 6(x)| dV (x) 
o(R5) 


“(l, MaVGl) [sea] 


Bs 
= MV(9(Rj)) + MBV(R;)) < 2MBe 
ES 


where M = sup, |f(¢(x))| and A’ = sup, |detdg(2)|. Since ¢ is arbitrary, this 
implies the equality of the theorem. a 


With some additional hypotheses, the above theorem can be strengthened so as 
to apply to integrals over the full open sets U and 4(U) rather than just to integrals 
over compact subsets. The next theorem is such a result. 


10.5. The Change ef Variables Fermula 313 


Theorem 10.5.14. Let U be an open Jordan region in R® and let ¢: A-+ R® be 
a one-to-one smooth function on U with image ¢(U) which is also a Jordan region. 
Suppose dé is non-singular on U and f is bounded on g(U) and continuous except 
on a subset of volume 0. Then f is integrable on (U). If, in addition, f 0 ¢| det d| 
is bounded on U, then it too is integrable on U and 


f(u) dV (u) = ip f(9(2))| det dg(x)| dV (x). 


(UV) 


Proof. Since d¢ is non-singular on U, Theorem 9.6.5 implies that ¢: U + Risa 
one-to-one open map onto an open set V. 

Since f is bounded on ¢(U) and continuous except on a set of volume 0, it 
is integrable on G(U). The function g(x) = f(¢(x)) |det dé(x)| is continuous and 
bounded and, hence, is an integrable function on U. 

Let An be a sequence of compact Jordan subsets of U such that U,, A’? =U 
Such a sequence exists by Exercise 10.3.12. Then, by Exercise 10.3.1, 


(10.5.8) [ oeav(e) =m [ g(«) dV (2). 


Also, since {(A’n)} is a sequence of compact subsets of V = ¢(U) with the union 
of the interiors of the sets in the sequence equal to V, we conclude 


(10.5.9) bs F(u) dV (u) = tim ie f(u) dV(u). 


The previous theorem implies that 


of f(u) dV (u) = |. ave) 


for each n. This, together with (10.5.8) and (10.5.9), completes the proof. a 


The change of variables theorem has the following corollary, the proof of which 
is left to the exercises. 


Corollary 10.5.15. Let U be an open Jordan region in R® and let @: U -+ R® be 
a function satisfying the conditions of the previous theorem. Then 


vow) = f | det dé(x)|dV (x). 


Note that, in the change of variables formulas in the above theorem and its 
corollary, the sets U and ¢(U) may be replaced by their closures, even though the 
transformation @ may not be defined on the closure of U. This is due to the fact 
that the boundaries of U and ¢(U) have volume 0. 


Example 10.5.16. Use the preceding corollary to find the area enclosed by an 
ellipse with major and minor axes of lengths 2e and 2b without assuming knowledge 
of the area of a circle. 

Solution: Such an ellipse has equation «?/a? + y2/b? = 1. The region it 
encloses is the image of the square A = {(r,0):0 <r <1,0< 0 < 2x}, under the 


314 10. Integratien in Several Variables 


transformation 4(r,0) = (ar cos0, br sin 0). The differential of this map is 
acos@ —arsind 
Bee) = (ie br cos 6 ir 


The determinant of this matrix is abr, which is non-zero except at r= @. Thus, the 
function ¢ is one-to-one and smooth with non-singular differential on the interior 
of the square A. The interior of A is taken by @ to the interior of the ellipse with 
the line joining (@,@) to (1,@) removed. This set differs from the ellipse itself by a 
set of volume @. Thus, the area we seek is, by the previous corollary and Fubini’s 


Theorem, 
Qn pl 
| fi abr dr d@ = mab. 
o Jo 


1 vies 
Example 10.5.17. Find [ i cos(x? + y”) dydx. 
0 0 


Solution: By Fubini’s Theorem, this integral is 
: cos(x? + y2) dV(a,y), 
D 


where D = B;(@,@). If we change to polar coordinates using the transformation 


d(r,0) = (cos 0, rsin 8) 


then det d¢(r,0) = r and D = 4(R), where R is the rectangle [0,1] x [0,2z]. On 
R, ¢ is smooth with non-singular differential except when r = 
10.5.14 applies with U = R°. Hence, 


and so Theorem 


| cos(a? + y”) dV(2,y) = | cos(r*) rdrd0. 
'O(R) R 


Applying Fubini’s Theorem again yields 
1 pVT=a? Qn pl 
i | cos(a? + y”) dydx = i. [ cos(r?) rdrd@ = x sin 1. 
Oo 0 0 0 


8 
Exercise Set 10.5 


1. Compute the inverse of each elementary matrix Ej;, S;j, and T;(a). Show that 
each inverse is itself an elementary matrix or a product of elementary matrices. 


2. Show that if F is a Jordan region and L is a linear transformation whose matrix 
is singular, then L(F) has volume @. 

3. Let wu and v be two vectors in the plane and define 6: R? + R? by 4(s,t) = 
su+tv. Let A be the parallelogram which is the image of [0,1] x (0, 1} under 
6. If f is a continuous function on A, express f' f(x,y) dV (x,y) as an integral 
over [@, 1] x [@, 1]. 

4. Use the result of the previous exercise to find a formula for the area of the 
parallelogram determined by two vectors u and v. 


0.5. The Change ef Variables Fermula 315 


1% 
12. 


- Compute f° [ 
Jo Jo 


. Let A= {(2,y) ER? : 2 >0y > 0,22 +y? < 4,2? —y? > 1}. Compute 


. An orthogonal transformation is a linear transformation A that preserves inner 


products — that is, Au. Av = u-v for each pair of vectors u,v. Note that 
a rotation is an orthogonal transformation. Prove that a d x d orthogonal 


transformation preserves volume in R®. 


“9? dyda. 


vy ee 
| pve) 


by making a change of variables u = x? + y2,u = 


= y? for a > 


. Compute the volume of a sphere § of radius r by computing the integral 


[ LaV(z). 


Compute this integral by first converting to spherical coordinates. 


. Compute the volume of a right circular cone with height h and radius a. Hint: 


Such a cone can be described in cylindrical coordinates as the set of points 
{(r,0,2):0<7r< paes 0 <2}. 
h 


Here x = r-cos0,y = rsin0, 2 = = describes the transformation from cylindrical 
to rectangular coordinates. 


. Show by example that the conclusion of Theorem 10.5.13 does not hold if the 


function ¢ is not one-to-one on A. 
Prove Corollary 10.5.11 


Prove Lemma 1@.5.12. 


—sy 
Chapter 11 


Vector Calculus 


Previous chapters have dealt with integration over intervals on the line and over 
Jordan domains in R®. In this chapter, we study integration over curves and surfaces 
in R®. Here, the surfaces involved could be ordinary two-dimensional surfaces in 
R°, but they might be p-surfaces in R“ for any p < d. In this study, the objects to 
be integrated are no longer functions, but closely related objects called differential 
forms. Differential forms, like surfaces, have a dimension. Thus, there is a notion of 
a p-form for each non-negative integer p. When a differential form is integrated over 
a surface, the dimensions must match. Thus, we integrate p-forms over p-surfaces. 


The culmination of this study is a series of very powerful theorems ~ Green’s 
Theorem, Gauss’s Theorem, Stokes’s Theorem ~ which are really all special cases 
of one very general theorem, which is also usually called Stokes’s Theorem. 


11.1. 1-forms and Path Integrals 


We begin with the one-dimensional case: curves and integration of 1-forms over 
curves. 


Smooth Curves. Recall from Section 9.4 that a curve in R® is a continuous 
function y : J + R* which has an interval J on the line as its domain. The interval 
J is called the p: 
have a derivative 7’ on the interior of J. The derivative is defined in the usual wa; 


ameter interval for the curve. We will be focusing on curves which 


/(t) = lim us) = 1). 


sot os—t 


Note that if the curve ~(t) is expressed in terms of its coordinate functions, 
a(t) = (y(t), 72(4), «5s 7e(t)), then ¥/(t) = (4 (4), 22(t), ---¥e(4)- 


317 


318 11. Vecter Calculus 


Ha) 


v(t) 


Figure 11.1.1. A Smooth Curve in R?. 


Definition 11.1.1. A smooth curve 7 is a curve with a bounded, continuous de- 
rivative 7! on the interior of its parameter interval J. 


The trace of a curve 7 with parameter interval J is its image >(I) in R¢. A 
curve is said to lie in the subset F of R¢ if its trace is contained in F. 
Example 11.1.2. Find a smooth curve which traces a straight line from u to v in 
R*. What is the derivative of this curve? 

Solution: The curve 7, defined by 7(t) = u+l(v—u), t € [0,1], begins at 
u = 7(@), moves in the direction of the vector v — u as ¢ increases, and ends at 
v = (1). The derivative of 7 is the constant vector 7/(t) =v — u. 


Piecewise Smooth Curves — Paths. We will also need to consider curves which 
are only piecewise smooth — that is, curves which have a bounded, continuous 
derivative except at finitely many points of the parameter interval J. The precise 
definition is as follows: 


Definition 11.1.3. Let 7: I + R¥ be a curve. We will say that 7 is piecewise 
smooth if there is a partition a = to < t < ty < +++ < tn = b of I such that, 
for each j, the restriction of 7 to the subinterval (1;-1,4;) is a smooth curve. A 
piecewise smooth curve will also be called a path. 


If 7 is a path as described above, then 7’ exists and is continuous and bounded 
on 1 \ {to,...,tn}- 

One may think of a path as finitely many smooth curves which join together 
to form a single curve which is smooth everywhere except at points where two of 
the original curves join. At such points the curve may abruptly change direction. 


Example 11.1.4. Find a path that traces once around the square with vertices 
(0,0), (1,@), (1,1), (@,1) in the counterclockwise direction. Find 7/(t) on the subin- 
tervals where + is smooth. 


11.1. 1-ferms and Path Integrals 319 


0) 


v(t) 


Figure 11.1.2. A Path in R? 


Solution: We choose [0,1] as the parameter interval and define a path 7 as 
follows: 
(4t,@), esis, 
(1,4t-1), 1/4<t< 1/2, 
(3-481), 1/2<t<3/4, 
(0 1 =t), 3/4 StS 1. 
This is continuous on [0, 1] and smooth on each of the subintervals in the partition 
@< 1/4 < 1/2 < 3/4 < 1. It traces each side of the square, moving in the 
counterclockwise direction. On the first interval, 7/ is the constant vector (4,@), on 
the second it is (@,4), on the third it is (—4,@), and on the fourth it is (@,—4). 


y(t) = 


Closed Paths. The preceding example is an example of a closed path — that is, 
a path 7 which begins and ends at the same point. This means that (a) = 7(b), 
where [a, }] is the parameter interval. The following is another example of a closed 
path: 


Example 11.1.5. Find a path which traces once around the circle of radius r in 
R?, centered at uo. 


Solution: The circle of radius r centered at uo consists of all points in R? of 
|| = 1. A parameterized curve which traverses this set 
once in the counterclockwise direction is given by 7(t) = uo + (rcost, rsint) for 
O<1< 20. 


Length of a Path. 


Definition 11.1.6. The length of a path 7 : [a,b] + Re is the number f(y) defined 
by 


b 
coy = f nreolae 


Note that the integral in this definition exists because ||7/(t)|| is bounded and 
it is continuous except at finitely many points on [a, }]. 


320 11. Vecter Calculus 


Example 11.1.7. Find the length of the path in R? given by 4(t) = (203, 342) for 


te [0,1]. 
Solution: Since 7/(t) = (6t?, 6) and ||7’(¢)|| = V364 + 362 = 6tV/# +1, we 


conclude 


1 2 a 
ta) =6 f we tia=3 f Vudu = 2u*/?|) = 2(2V2 - 1) 


0 


where we have made the substitution u = t? + 1, du = 2tdt. 


Differential 1-Forms. Recall from Chapter 9 that if F is a differentiable func- 
tion from an open subset of R? to R4, then its differential dF() at a point « is a 
linear transformation from R? to R¥ and, as such, may be represented by a q x » 
matrix (the matrix of partial derivatives of the coordinate functions). In particular, 
an R-valued function f on an open subset of R¢ has differential df(«) at a point « in 
its domain which is a linear function from R@ to R — represented by a 1 x d matrix 
(such a thing is just a d-vector, but we wish to think of it as a linear transformation 
from R¢ to R). A notation for df that was introduced in Section 9.4 is 
df= shay + shea feet Shae 
Here, dj may be thought of as the differential of the jth coordinate function 
aj. When represented as a 1 x d matrix, de, is 1 in the jth entry and @ in all 
other entries. This determines the linear transformation which sends a vector of 
dimension d to its jth component. Similarly, df may be represented as the 1 x d 
matrix which is os in the jth entry for each j. 
ee | 

A differential 1-form ¢ on a set E in R¢ is just a continuous function which 
assigns to each point x of E a linear function $(«) : R4 + R. Since the diz; form a 
basis for the vector space of such functions, each differential form ¢ may be written 
in the form 

b= didey + dodey +--+ badea, 

where the functions ¢) are continuous R-valued functions on E. For example, if E 
is a subset of R?, then a 1-form on F is an expression of the form fdx + ay, where 
fand g are continuous functions on U. 


ind of differ- 
: ; EN Sie ee) 
ential 1-form, one where the functions @; are the partial derivatives or of f. 
J 


Note that the gradient df of a differentiable function is a special 


Integration Along a Path. Let 7: [e,b] +R be a path. Since + is a function 
from a subset of R to a subset of R¢, its differential dy is a function which assigns 
toa point t € [e,6] a linear function from R to R4— that is, a dx 1 matrix. In fact 
this matrix is just the vector 7'(t) regarded as a column vector. In this chapter, we 
will write 


dy(t) = ¥'(t) dt 
where dt is to be thought of as the differential of the identity function (the function 
that sends ¢ to itself) and ¥’(¢) is to be thought of as a dx 1 matrix. This formalism 
may seem unnecessarily complicated, but it is very useful in the coming discussions 


11.1. 1-ferms and Path Integrals 321 


of transformation laws for paths, differential forms, and integrals under changes of 
variables. 

If ¢ is a differential 1-form defined on a set containing the trace of y, then 
(7(t)) acts on d>(t) through matrix multiplication to produce a real number 
6(7(t))d>(t). The resulting real-valued function is a bounded function on [a,b] 
which is continuous except at finitely many points. We may integrate this function. 

The resulting integral has a very important property — it is independent of the 
parameterization of the path. We will prove this in the next section. An integral 
of this type is called a line integral or path integral. The formal definition is as 
follow: 


Definition 11.1.8. If ¢ = idx + dodv2 + ++» + dada is a continuous 1-form 
defined on a set A in R@ and if y = (11,92,---;7a) is a path in A with parameter 
interval [a,b], then the integral of ¢ over 7 is defined to be 


[o= [oer t))d>(t) = : a(r(t))9 a= [Y ae) t)dt. 


A useful device for remembering and applying this definition is suggested by 
the use of differentials in the change of variable formalism for the Riemann integral: 
the jth coordinate «; of a point on the curve + and its formal differential dir are 
given by 


aj =y(4), 


LL 
( ) dx; = 7; (t) dt. 


The formula for the integral given in Definition 11.1.8 is 


b 
fo (x) dry +--+ + Ga(x) dea) = [ (r((t)) (0) +++ + bal y(t) 7e(t)) at. 
, ls 
We may think of the right side of this equation as being obtained from the left side 
by making the substitutions (11.1.1). 


Example 11.1.9. Find [ua +ady) and [w dx + x dy) if 
A Ih 


y(t) = (1+ 2t,14+3t) for O<t<1, 
A(t) = (1+20?,14+30) for O0<t<1. 


Solution: On the curve 7, we have x = 1 + 2t, de = 2dt, y = 1+ 3¢, and 
dy = 3dt. Thus, 


1 1 
[(varteay = [ (1+392+ (1+ 208)a0= f (+12) a= 11. 
Jy 0 0 
On the curve A, we have x = 14202, de = dtdt, y =1+32, and dy = 6tdt. Thus, 
1 1 
[(vartedy = [ (1+ 80a + (+ 20)60) de f (240 + 10H at = 11 
JX 0 JO 


Thus, the two integrals yield the same result. Note that y and ) are just different 
parameterizations of the straight line joining (1,1) to (3,4). 


322 11. Vecter Calculus 


The Fundamental Theorem of Calculus. A simple consequence of the Funda- 
mental Theorem of Calculus in the context of differential forms and paths is the 
following. 


Theorem 11.1.10. Let y be a path in R¢ with parameter interval [a,b] and let f 
be a differentiable function on a set containing (1). Then 


[ af = f((0) — F(a). 
f 


Proof. First assume the path 7 is a smooth curve. If y = (71,--.,7Ya), then 


fa [aroma = [aren 
b 


= [ (fo 9)'() at = £9(6)) — (2), 


by the Chain Rule and the Fundamental Theorem of Calculus. 


The proof in the case where ¥ is not smooth is left to the exercises (Exercise 
11.1.9). Qo 


Simple Paths and Smooth Simple Paths. A path 7 with parameter interval 
I is said to be simple if it satisfies the following two conditions: 


(1) if s and ¢ are distinct, points of J which are not both endpoints of J, then 
a(s) A v(t): 

(2) + not only exists but is non-vanishing at all but finitely many points of the 
interior of I. 


The first condition says that 7 is one-to-one, except that we allow the endpoints of 
T to be sent to the same point in the case of a closed path. The second condition 
says that 7: J + R¢ has a well-defined tangent line at all but finitely many interior 
points of I. Intuitively, a simple path is one which does not cross itself or retrace 
portions of itself and has a tangent line at all but finitely many points. A simple 
closed path is a closed path which is simple — for example, a circle traversed once. 

A smooth simple curve 7 is a simple curve which is smooth and which has 
V(t) Z @ at each interior point of J. This means that the tangent vector T(t) = 
(t)/||7"(L)|| is defined at each such point. Note that, since a smooth curve may 
not be simple (it may cross itself), there may be more than one tangent vector 
at a given point of the trace (I) of I; however, these will correspond to different 
parameter values. A smooth simple curve has a well-defined tangent vector at each 
point of 7(Z) except possibly at >(a) or 7(b). 


————————_&—EEEEEE A 
Exercise Set 11.1 


1. Find a smooth curve in R? which traces the straight line from (1,2) to (3,@). 


2. Graph the spiral curve in R? defined by 7(1) = (tcost, tsin t), @< t < 47, and 
then find its length. 


11.2. Change ef Variables 323 


3. Find the length of the curve y(t) = (¢,t?/2),@<t< 1. 

4. If ¢ is the 1l-form ¢(x,y) = xdx + ydy and 7 is the curve 7(t) = 
@<t< 1, then find fo. 

5. If dis the L-form ¢(2,y) = ay dx — 2x? dy and 7 is the curve 7(t) = (cost, sint), 
@<¢t< 7/2, then find Ve o. 


6. In R¥ let 6 be the I-form o(x,y,2) = 2? dx + y? dy + dz. Find f,@ if y(0) = 
(cos(2zt), sin(27t),t—t?), @<t<1. 

7. In R’, let ¢ be the L-form ¢ = sin z dx +cos z dy +y? dz and let y be the smooth 
curve 7(t) = (cost, sin#,t), @<¢ < 2x. Describe >(J) and find f, 4. 


8. If 7: [@,1] > R® is a path, set ~7(t) = 7(1— ¢) — that is, ~y is y traversed 


backwards. Show that 
[ e=- fe 
J-7 Jy 


for any 1-form ¢ defined on the trace of y. 


9. Theorem 11.1.18 was proved in the case where 7 is smooth. Use this to prove 
that the theorem also holds in the case where 7 is not smooth — that is, the 
case where it is made up of several smooth curves joined together. 


10. Prove that if 7 is a closed path and f is a smooth function defined on an open 
set containing the trace of 7, then J df = @. 


11.2. Change of Variables 


There are some arbitrary choices made in our descriptions of paths and 1-forms in 
the previous section. A path 7 comes with a choice of parameterization. Does the 
integral along this path depend on the choice of parameterization or is it only the 


trace y(J) that is important and we are free to parameterize it any way we wish? 
Also, the descriptions of paths and 1-forms in R® involve a choice of a coordinate 
system for R®. If this is changed, the expression for a path will change in accordance 
with this change of coordinates, How should the expression for a 1-form change in 
order that the integral remains the same? These are crucial questions. Their 
resolution is the key ingredient in the proofs of the main theorems of this chapter. 


Parameter Independence. The equality of the integrals in Example 11.1.9 is 
not an accident. The integral of a 1-form over a path is essentially independent of 
how the path is parameterized. The precise statement of this independence is the 
next theorem. First we give a definition: 


Definition 11.2.1. Suppose + and A are smooth curves in R® with parameter 
intervals [a,b] and [c,d], respectively. Let @ be a continuous function from [c,d] 
onto [a,b] which is smooth with non-vanishing derivative on (c,d). If \= yoa, 
then we will say that @ determines a smooth parameter change from ¥ to .. If, in 
addition, a/ > @ on (c,d), then we will say that a is orientation preserving. On the 
other hand, if a’ < @ on (c,d), we will say that a is orientation reversing. 


324 11. Vecter Calculus 


Note that since a/(t) 4 0 for all t € (c,d), then a’ is either everywhere positive 
or everywhere negative on (c,d) by the Intermediate Value Theorem (Theorem 
3.2.3) applied to a’. This, in turn, implies that a is either increasing on [e, d] or 
decreasing on [c,d] (recall that such a function is said to be strictly monotone on 
{e,d]). 

Intuitively, a smooth parameter change replaces 7 with a new path \ which 
traverses the same trace, moving consistently in either the same direction or the 
reverse direction of the original path 7. 


Theorem 11.2.2. Suppose y and \ are smooth curves in R* with parameter in- 
tervals [a,b] and [c,d], respectively, and suppose a determines a smooth parameter 


change from 7 to. Then 
fo-e[> 
» 7 


for each 1-form ¢ = ¢,dx, +--+ daxa defined on a set containing the common 
trace of y and. The factor of +1 that appears on the right in this equality will be 
positive if a is orientation preserving and negative if it is orientation reversing. 


Proof. This is a simple application of the Chain Rule and the change of variable 


formula for integrals on the line. Suppose first that a is orientation preserving. By 
the Chain Rule, we have dA(t) = d>(a(t))da(t) = 7/(a(t))a’(t) dt, and so 


[o = [oo aA) = [ebro (a(t)))9"(a(e))a’ (eat 
= [aoemn'oms= fo 


where we have made the substitution s = a(t), ds = da(t) = a/(t) dt. 


If a is orientation reversing, then a and b will be reversed in the fourth integral 
above and to undo this reversal introduces a factor of —1. Oo 


Definition 11.2.3. If y and \ are two paths which have the same trace and if 


es 


for every 1-form ¢ defined on the common trace of y and A, then we will say that 
and are equivalent paths. 


Theorem 11.2.2 says that if there is an orientation-preserving smooth parameter 
change from 7 to A, then the paths 7 and . are equivalent. 


Remark 11.2.4. If 7 and ) are paths and if there is a smooth parameter change 
a from + to A, then a has an inverse function a~? : [a,b] + [c,d] and it is a smooth 
parameter change from A to 7 (see Exercise 11.2.6). 


Example 11.2.5. In Example 11.1.9 the two curves 7 and \ are shown to be 
equivalent. Is there a smooth orientation-preserving parameter change from 7 to 
A? Is there a smooth orientation-preserving parameter change from A to y? 
Solution: The function a(t) = ¢? is increasing and has the property that 
A= oa. Also, it has a positive, continuous derivative on (0, 1) and so it is a smooth 


11.2. Change ef Variables 32 


a 


parameter change. Note that a’ is bounded on (@,1) in this case. The smooth 
parameter change going in the other direction (from A to 7) is a~1(s) = V8. This 
function does not have a bounded derivative on (0, 1), but that is not a requirement 
for a smooth parameter change. 


Example 11.2.6. Consider the paths in R? given by 7(t) = (cost, sin) and A(t) = 
(cost, —sint) for @ < t < 2. Is there a smooth parameter change from 7 to A? 
Are 7 and A equivalent? 


Solution: These paths each traverse the circle of radius 1 centered at (0,0) 
in R2 once, but in opposite directions. The function a(t) = 20 —¢ is a smooth 
parameter change from 7 to A, since cos(2n — t) = cost and sin(2v — t) = —sint. 
However, a is orientation reversing, and so Theorem 11.2.2 tells us that 7 and A 
are not equivalent. We can confirm this by direct calculation if we choose the 1- 
form 4(«,y) = —ydx + ady. On 7 we have x = cost, de = —sintdt, y = sint, 
dy = costdt. Thus, 


Qn Qa 
[o- [ (sin? t + cos? t) dt = 1dt = 2n. 
y le e 


On A, « and dz are the same, but y = —sint, dy = —costdt. Thus, 


Qn Qn 
[e- [ (sin? t — cos? t) dt ni (-1) dt = -27. 
JX JO e 


Theorem 11.2.2 leads to a strategy which, for many paths 7 and A with the same 
trace, yields a proof that they are equivalent paths. Suppose that the parameter 
intervals for the two paths can each be partitioned into n subintervals in such a 
way that for j = 1,...,n, 7 on its jth subinterval and A on its jth subinterval are 
related by a smooth orientation-preserving parameter change aj, as in Theorem 
11.2.2. If this can be done, then it clearly follows that f° = J, @ for any I-form ¢ 
which is defined on a set containing the common trace of 7 and A. Hence, the two 
paths are equivalent in this situation. 


The question of parameter independence is particularly simple for smooth, 
simple curves. 


Theorem 11.2.7. [f+ and are two smooth, simple non-closed curves in R¢ which 
begin at the same point, end at the same point, and have the same trace, then there 
is an orientation-preserving smooth parameter change from to \. Hence, y and 
X are equivalent in this case. 


Proof. Let the parameter intervals for 7 and A be (a, ] and [c,d]. For each t € [c, d] 
there is an s € [a,b] such that A(t) = 7(s). This is because both 7 and have the 
same trace. Furthermore, since 7 is one-to-one, there is only one such s for each 
t. We denote this s by a(t). This defines a function a : [c,d] + [a,b] such that 
A(t) = y(a(t)). We will show that a has a continuous positive derivative on (c,d). 
This follows from the Implicit Function Theorem, as we shall show below. 

We set F(s,t) = A(t) —7(s). Then F is a smooth function from [a, }] x [c,d] to 
R¢. If tg is a point of (c,d), we wish to show that a/(t) exists in a neighborhood of 
to and is continuous at to. 


326 11. Vecter Calculus 


Let 89 = a(t). Since "(s) # @ for each s, it follows that 2/%(55, #9) £ @ for at 
least one of the coordinate functions fj of F. By the Implicit Function Theorem, 
there is a smooth function f defined in a neighborhood of to such that B(to) = so 
and fj(s,t) = @ for (s,£) in a neighborhood of (so,to) if and only if s = S(t). 
Since we have F(a(t),t) = @ for all t € [e,d] by the choice of a, we also have 
fi(a(t),t) =@. It follows that 8(t) = a(t) in some neighborhood of to. Thus, a is 
smooth in a neighborhood of to. 


The fact that a’(t) is non-vanishing follows from the Chain Rule. Since A(t) = 
(a(t), the Chain Rule implies that 


N(t) = ¥'(alt))a’(t). 


Here, a/(t) is a scalar multiplying the vector 7’ (a(d¢)). If there were a point t where 
a'(t) = @, then we would have /(t) = @ also, and this is not possible, since \' is 
non-vanishing. Thus, a is a smooth parameter change from ¥ to d. 


Since a’ is non-vanishing on (a,b), it is either strictly positive or strictly nega- 
tive by the Intermediate Value Theorem. Hence, a is either increasing or decreasing 
on [c,d]. It must be increasing, since it takes c to a and d to b. Thus, a is orientation 
preserving, a 


What if we do not assume that the two curves in the preceding theorem are 
non-closed? What if they are closed? Does the theorem still hold? If not, is there 
a way to modify the theorem so that it does hold in this case? These questions are 
dealt with in the exercises. 


Arc Length Parameterization. Suppose + is a smooth curve with parameter 
interval [a,b]. We define a change of variables from t to a new variable s by setting 


= fh’ eollae 


for each ¢ € [a,b]. That is, s(t) is the length of that part of the curve + for which the 
parameter w lies in the interval (a, t]. Furthermore, by the Fundamental Theorem 
of Calculus, 

ds = |\y'(t)||dt. 


Since ||)'()|| is a positive continuous function of ¢ and since it is the derivative 
of s, it follows that s, as a function of ¢, is a continuous, increasing function from 
[a,b] to [@, ¢(y)] which is smooth on (a,b). Hence, its inverse function defines ¢ as a 
continuous, increasing function of s for s € [®, ¢(>)] with image [a,b]. Furthermore, 
smooth on (@,). This defines a smooth parameter change from 7 to the curve 
As) = r(t(s)). 

The length of a curve remains the same after a smooth parameter change 
(Exercise 11.2.7). Thus, given s € [@,(7)], the length of that part of A for which 
the parameter lies between @ and s is the same as the length of that part of + for 
which the parameter lies between a and t. This is exactly 


‘ 
(11.2.1) i |h'u)||du= s. 
oO 


11.2. Change ef Variables 327 


That is, s is the length of that part of A for which the parameter lies in [0,8] . A 
smooth curve or a path with this property is said to be parameterized by arc length. 


Since each path is made up of a number of smooth curves joined together, we 
have proved: 


Theorem 11.2.8. Each path in R® may be reparameterized so as to be a path 
parameterized by are length. 


Equation (11.2.1), when applied to the curve \ parameterized by are length, 
yields 


s= foliar 
JO 


where A, denotes \ restricted to [0, s]. On differentiating and using the Fundamental 
Theorem of Calculus, we conclude that |//(s)|| = 1 for each s. That is, W(s) is a 
unit vector. This unit vector is often denoted by T and is called the unit tangent 
vector toy. A simple calculation shows that, in terms of y, T = 7'/||9'|I- 


Classical Form for Path Integrals. Let ¢ = fi dz1+---+ fp dxp be a 1-form on 
a subset A of R? and let y be a simple path in A with trace C. If F =(fi,..., fr) 
is the vector-valued function determined by the components of ¢, then the path 
integral of ¢ over ¥ is classically written as 


(11.2.2) [om [F-Tas 


where T = ¥'(t)/||y7/(¢)|| is the unit tangent vector to 7 and ds = ||y'(t)|| dt is the 
differential of arc length along 7, as above. Here the integral on the right is just 
another way of denoting 


b 


b 
[ Foo) ToMivolae= [ FO-10 ae. 


Integrals of this type arise in may contexts in physics. For example, if F is a 
force field acting on an object, then the above path integral represents the work 
done by the force field as the object moves along the path 7. 

The classical notation represents the integral of a 1-form along a path as the 
integral of an ordinary function F . T with respect to are length along the path. 
Such an integral can be defined for any continuous function along the path. This 
leads to the definition of an integral along a path for ordinary continuous functions 
as opposed to 1-forms: 


Definition 11.2.9. If f is a continuous real-valued function, defined on the trace 
C of a path y with parameter interval (a, 0], then we define 


[uw | * pow) Ib'Ollat 


This is called the integral of f over C' with respect to are length. 


328 11. Vecter Calculus 


Change of Variables for 1-Forms. A smooth parameter change is one kind 
of change of variables. It is a change in the independent variable of a path. It 
is equally important to understand how to deal with a change of variables in the 
dependent variable space. By this, we mean a smooth one-to-one function from one 
open set in R@ to another which has a non-singular differential. 


More generally, let U be an open subset of R? and let H : U + R% be any 
smooth function. The function H could be a smooth change of variables or possibly 
a function which parameterizes a piece of a p-surface in R?. It is important to 
understand how functions, paths, and differential forms are transformed by H. 
Such an understanding will allow us to solve problems concerning functions, paths, 
and forms on complicated sets by reducing the problem to an analogous problem 
on a simpler set such as a square or a cube. We have already done this type of 
thing. This is exactly what is involved when we parameterize a path in order to 
express the integral of a 1-form over the path as an integral of a function over an 
interval I on the line. 

With U C R? and H : U - R4 as above, if 7: I + U is a path in U, then 
Hoy:1 Risa path in R?. On the other hand, if f is a function defined on a 
set containing H(U), then fo H is a function defined on U. We will often call this 
function H*(f). Note that, while y + H 07 takes paths in R? to paths in R4, the 
operation f —+ H*(f) goes the other way ~ that is, it takes functions on a subset 
of R to functions on a subset of R?. Note that there is the following relationship 
between the two operations: if we evaluate the function H*(f) along the curve ¥, 
the result is the real-valued function H*(f)o7 on I. On the other hand, 


H*(f)oy=(foH)oy=fo(Hoy) 


which is the result of evaluating f along the curve Ho 7. 

How do 1-forms transform under a function H, as above? This is best. under- 
stood by seeing how a 1-form of the form df should transform. 

Let f be a smooth function defined on U and let af be its differential (considered 
as a vector-valued function on U). Under H, f transforms to H*(f) = fo H. 
The differential of this function, by the Chain Rule, is the vector-matrix. product 
(ef oH)dH. This suggests that we should regard (€f o H)@H as the appropriate 
transform of df under the function H. This, in turn, suggests that the function H 
should transform every differential 1-form on U in the same manner. That is, H 
should take a differential 1-form ¢ to (¢0 H)dH, where $0 H is a vector-valued 
function and @/ is a matrix-valued function on V, and ($0 H)dH is the vector- 
matrix product of go H with dH. This leads to the following definition. 


Definition 11.2.10. If U is an open subset of R? and H : U + R? is a smooth 
function, then for each function (0-form) f on H(U) and each 1-form ¢ on H(U), 
we define a function H*(f) and 1-form H*(¢) on U by 


H*(f)=foH and H*(¢)=(¢0H)dH. 
Example 11.2.11. Let H : U — R® be a smooth function, as above, with U an 


open subset of R2. If we regard the coordinates (x,y, z) of points in the image of H 
to be functions on U of the variables (u,v) through the equation (x,y,z) = H(u,v) 


11.2. Change ef Variables 329 


and if ¢(2,y,2) = f(x,y, 2) dx + g(x,y, 2) dy + h(x, y,z) dz is a 1-form on H(U), 
then write out H*() in the (u,v) coordinates. 


Solution: In vector notation, the new 1-form is 


dx Oe 
Ou Ov 
H*(d) =(¢0H)dH =(foH, goH, hol) a a 
dz az 
Ou Ov 


Ox oy az Ox ay dz 
= (son Broo Denon pont toon non) 
where all functions are evaluated at (u,v). If we write this in terms of the basis 
vectors du and du, it becomes 
Ox dy az 
(ron Se bgon St hon) du 
Ox dy dz 
Ae atl — | dv. 
+ (te a toe +hon =) de 


Remark 11.2.12. An easy way to remember how a I-form ¢ = fdx + gdy + hdz 
in R® transforms under a function H :U + R° with U CR? is to think of making 
the replacements 

(x,y, 2) = H(u,v) 


for (x,y,z) in f(x,y, z), g(a, y,2), and h(x,y, z) and the replacements 


Ox f) 
de = = du + 5 du, 
oy oy 
dy = du + 
ar came 
Oz Oz 
Pe ractat eed 
dz = du t+ Fdv 


This leads to the same expression for the transformed 1-form as is obtained in the 
preceding example. The same formalism works for transforming 1-forms on R? to 
1-forms on R? under any smooth function H from an open subset of R? to R?. Note 
that, when p = q = 1, this formalism is just the procedure for replacing f(«) dx by 
the appropriate expression when doing a substitution « = H(w) in an integral on 
the line. 


Example 11.2.13. Consider the function H(r,0@) = (rcos@,rsin@) for r > 0 and 
—n <@0< am. This is the change of variables « = rcos#,y = rsin@ between 
rectangular and polar coordinates. For the 1-form ¢(2,y) = «dx + ydy, what is 
H*(¢)? 
Solution: We make the replacements 
x =rcos0, =rsind, 
dx = cos 0dr — rsin@ dé, 
dy = sin 0 dr + rcos0 dé. 


330 11. Vecter Calculus 


Then (a, y) = «dx + y dy is transformed to 
H*(9) = rcos? @dr — r® sin @.cos 6 d0 + r sin? 0 dr +r? sin 0 cos 0d0 
= r(cos” @ + sin? @) dr = rdr. 
Change of Variables in Path Integrals. The transformation law for 1-forms 


under a smooth transformation is the correct one if we want path integrals to be 
preserved. 


Theorem 11.2.14. If U is an open subset of R?, H : U + R? is a smooth trans- 
formation, ¢ is a1-form on H(U), andy: 1 U is a path in U, then 


[ro a Ee 


Proof. Ultimately, this reduces to the Chain Rule and the definition of the integral 
of a I-form over a path. That is, if J = [a,], 


[ro = [oonan= fe (H1(9(t))) dH (o(t))y' (0) dt 
-[ oH or(t)\(Hoyy(t)dat= | 6. Qo 


Hoy 


Example 11.2.15. Find J,(xda + ydy) for the path A(t) = (cost, sint) with —7 < 
t < x, by first changing to polar coordinates (as in Example 11.2.13) and then 
integrating the resulting 1-form over the path given by 


(r,0) =(t) =(1,t), for —4<t<n. 


Solution: By Example 11.2.13, the form «dx + ydy transforms to rdr under 
the transform H to polar coordinates. Also, 4 = Hoy. Hence, by the previous 


theorem, 
[cart vay z | “ir 0, 
Jn ey 


since r = 1 and dr = 0 on 7. 


—— 
Exercise Set 11.2 


1. Are 4(t) = (t3,2), 0 < t < 1, and X(t) = (sin®t,1 — cos? t), 0 < t < 2/2, 
equivalent curves? Justify your answer. 

2. Are 7(t) = (cost,sint), 0 < t < 2x, and A(t) = (cost,sint), 0 < t < 4z, 
— curves? Justify your answer. 

3. Are 7(s) = (s, /1—s?), -1 < s < 1, and A(t) = (cost, sint), 0 < t <7, 
equivalent curves? aed the answer change if the parameter interval for \ is 


changed to —x <¢ < 0? Justify your answer. 
4. If y is a path with parameter interval (a, ], define a smooth parameter change 
(0, 1] as parameter interval. Hint: 
You simply need to find a smooth increasing function « : (0, 1] [a,b] and 


from y to an equivalent path which has 


11.3. Differential Ferms ef Higher Order 331 


then set \ = yoa. There are many such functions, but there is one which is 
particularly simple. 


a 


. Give an example to show that the conclusion of Theorem 11.2.7 does not hold 
if we do not assume the paths are non-closed. Tell how to restate the theorem 
so that it does hold for closed curves as well as non-closed curves. 


d = [a,b] is a smooth 
as 


6. Prove that if y and \ are smooth paths and a : [ 
parameter change from 7 to \, then a has a smooth inverse function «= 
[a,b] + [c,d] which is a smooth parameter change from ) to 7. Furthermore, 
a is orientation preserving if and only if a? is orientation preserving. 

7. Show that a smooth parameter change does not change the length of a smooth 
curve. 

8. If y(t) = (cos2rt, sin 2xt) for 0 < t <1, describe a curve equivalent to y which 
is parameterized by are length. 

9. Express the differential form ydx — «dy in polar coordinates (see Example 
11.213). 

10. Calculate f,(ydx — x dy), where A(t) = (cost,sint) for —t < t < 7, by first 

expressing this integral in polar coordinates, as in Example 11.2.15. 


11. Give a different solution to the problem in Example 11.2.13 by noticing that 
x dx +y dy is df for the function f(x,y) = (x? +y?)/2. What does f transform 
into under the change to polar coordinates? How does this lead immediately 
to the solution in Example 11.2.15? 


12. What does the differential form x dx + ydy + 2dz on R® transform to under 
the change to spherical coordinates? 


13. What does the differential form y dx — x dy +dz transform to under the change 
of coordinates x = u+2u, y= 3u—v,z=u+v+w? 
14. If H : (-1,2) x (—7,7) + R? is defined by 
H(u,v) = (cosucosv, sin ucos v, sin v), 


what does the differential form x dx + y dy + zdz transform to under H? 


11.3. Differential Forms of Higher Order 


The statements and proofs of the main integration theorems of this chapter (Green's 
Theorem, Gauss’s Theorem, and Stokes’s Theorem) all involve the algebra of differ- 
ential forms. We have already seen how differential 1-forms enter into the definition 
of path integrals. Second-order differential forms are involved in the definitions of 
surface integrals and third-order forms are related to integrals over solid regions in 
R3. In this section we introduce higher-order differential forms, the operations we 
shall perform on them, and the transformation rules that govern them. 


2-Forms. If coordinate functions z1,...,24 are chosen for R#, then we begin 


by constructing a vector space over R that has certain symbols dx; A dz; as basis 


elements. Here, we declare that 


(11.3.1) da; \da;=—da;\da; and da, A daj= 


332 11. Vecter Calculus 


for all i and j. Our basis vectors will then be the expressions diz; A dir for which 
i<j. Whenever a symbol «; A 2; with j > é occurs in a calculation, we simply 
replace it by —dx; A dx;. Of course, if dir; A dx; occurs, it is replaced by 0. 


Given a subset E of R¥, a differential 2-form is a continuous function on E 
with values in the vector space described above. Thus, a differential 2-form, when 
written out in terms of the basis described above, yields an expression of the form 


a 
= y fij(x) de; A dex;, 
icy 
where each f;; is a continuous function on E. 
We may construct 2-forms from 1-forms in two ways. 


First, there is a product operation, called eaterior or wedge product, which 
assigns to each pair ¢, # of L-forms a 2-form dA yg. If @ = 4, fide; and 
y= el gi da;, then 


a 
OAW= Y° fig; dei Ady =~ (figs — fy9:) dei Ndex;. 


mt i<j 
Here, in going from the first to the second sum, we have used the relations (11.3.1) 


to express the sum in terms of the basis vectors de; A dir; for i < j. 


Second, we may take the differential of a L-form: if ¢ = T4_, fy day is a 1- 


form defined on an open set U C R*, then we define a 2-form d¢, called the exterior 
differential of ¢, by 


@ a 
; a 
dg =~ df; Ndxj3= > oo a, A dx; 
=] ij=l * 
2 Of; — Of 
= (2 a da; \ da. 


i<j 


Note that we previously defined the differential of a function f (a 0-form) to 
be a certain 1-form df. Now we have defined the differential of a 1-form to be a 
certain 2-form. In general, the differential of a p-form will be a (p + 1)-form. 

Theorem 11.3.1 follows directly from the definitions. The proof is left to the 
exercises. 


Theorem 11.3.1. Let $, 0, and w be differentiable 1-forms and let f be a differ- 
entiable function defined on an open set U. Then 

(a) @Ap=—VAO; 

(b) 6A(+Y) =GNO+ GAY; 

(c) f(@AY) = (fd) Av = A (fe); 

(d) d(@+) = do + dv; 

(e) d(fe) =df No+ fdd. 


11.3. Differential Ferms ef Higher Order 333 


On R?, 2-forms are particularly simple. If x and y are the coordinate functions, 
then dx A dy is the only basis vector for 2-forms and so all 2-forms can be expressed 
as f dx A dy for some continuous function f. 

Example 11.3.2. Given 1-forms ¢ = f dx + gdy and wy = hdx + k dy, find 
(a) Aw and (b) dé. 
Solution: 
(a) @Aw= fhdx A dx + fkdx \ dy + ghdy A dx + gk dy A dy 
= (fk —gh)dx A dy: 
(b) do = df Ndx + dg Ady 


of of , 89 an OF , 
-($ de + 5 a) Ada + (= det 5 wv) Ady 


_ (99 of 
=(#- By) de dy 


On R’, the basis vectors dx A dy, dy A dz, and dx A dz are independent and 
generate a three-dimensional vector space. Thus, a typical differential 2-form on 
an open subset U of R has the form 

tidy Adz + fo dx \dz + fz dx Ady 
where fi, fo, and fz are continuous functions on U. 

In some contexts, a function F = (fi, fe, fs) from U C R? to R? is called a 
vector field on U. Thus, a 2-form ¢ = fidy A dz + fade Adz + fy dx A dy in R® 
determines a vector field F = (f1, fo, f3). We will call this the component vector 
field of ¢. Of course, a 1-form gi dx-+92 dy+g3 dz in R® also determines a component 
vector field G = (91, 92,93). 


Example 11.3.3. If ¢ = fidx + fody + fgdz and y = gidx + gody + g3dz are 
1-forms on U C R°, then find 


(a) ¢Ap and (b) dd 


Solution: Using (11.3.1) and collecting terms involving da A dy, dy A dz, and 
dx A dz, we obtain 


ONY = (fogs — f3g2) dy A dz + (fsgi — figs) dz A dx + (fige — fogi) de A dy 
and 
dé = df; N dx + dfa A dy + dfz Adz 
= (e By. dy \ dz + ( fi _ Of den ae le (BB) ae Ie A dy. 
ay Oz Oz Ox ax Oy 
Remark 11.3.4. Note that if F is the component vector field of the 1-form ¢ and 


Gis the component vector field of the 1-form ¥, then the formulas of the preceding 
example say that 


(1) the component vector field of @ Aq is F x G and 


(2) the component vector field of d@ is curl F, 


in terms of the classical cross product “x” and “curl” operations. 


334 11. Vecter Calculus 


3-forms. A differential 3-form on an open subset U of Rf is a sum of expressions 

of the form 

fdx; A dx; Aden, 
where f is a continuous function on U. As in (11.3.1), interchanging any two 
adjacent terms daj, dx, dz, in this expression changes the sign of the expression. 
If two of i,j,k are equal, then the expression is understood to be equal to 0. It 
follows from this that every 3-form on U may be expressed as a sum of forms as 
above with i<j <k. 

In the obvious way, the wedge product of three 1-forms is a 3-form and the 
wedge product of a I-form with a 2form is a 3-form. We define the exterior 
differential dé of a 2-form 

=> fajdei Ade; 
i<j 
to be the 3-form 


dp = 7 dfiy Nd; A der; = oa Shes dy M de; N de. 
i<j i<j k 
Example 11.3.5. If ¢= fidyAdz+ fodz Ade + fgdx Ady is a 2-form on an open 
subset of R°, find dd. 
Solution: By definition, 
dg = df, Ndy A dz + dfy A dz A dx + dfs A dx A dy. 


oh a os 


Since df; = a+ So dy + 5+ dz and since dyAdyAdz = 0 and dzAdy Adz = 0, 


the only non-zero ee df; e ‘dy Adz foil be: the Yarn invalvingedas A dy Ade, 
Similar statements hold for df, A dx A dz and df3 A dx A dy. It follows that 
d= Oh aed dy Ade + 2 SB ay nde nde Chas Nde Ady 
= (F are fe, 


Ox Oy 


D 
+2) in A dy A dz = divF dx A dy Adz, 
Oz 


if F is the component vector field of @. Here, div is the classical divergence operation 
on vector fields in RS. 


Theorem 11.3.6. Let f be a function which is C? on an open set U C R? and let 
be al-form with coefficients which are C? on U. Then 


(a) d(df) =0 and 
(b) d(dg) =0. 


Proof. We will prove part (a) and leave part (b) for the exercises. 
We have 


and 


(11.3.2) d(df) = 5 = : ed dey \ de}. 


11.3. Differential Ferms ef Higher Order 335 


Now for each pair of indices (j,k) that occurs in this sum, the opposite pair (&, j) 
also occurs. Furthermore 
Of Of 
Fe jOry — x,Oe; 
by Theorem 9.1.6 (since f is €?) and by Theorem 11.3.1(a). It follows that the 
jkth term and the jth term in (11.3.2) cancel each other and the sum is 0. This 
proves part (a) of the theorem. Qo 


and da; A de, = —dx, Ade; 


Although we won't do it here, one can of course define differential forms of any 
non-negative degree p and define the exterior differential of such a form. What the 
above theorem says for 1-forms and 2-forms is true for any €? p-form ¢ — that is, 
d?o = d(dd) = 0. A differential form ¢ is said to be closed if dé = 0 and exact if 
@ = dy for some form 7. Thus, exact €? forms are always closed. How about the 
converse? It turns out that the converse is not true in general, but it is true if the 
domain U of the form satisfies certain topological conditions. In particular, it is 
true if U is convex. We explicitly state this here for 1-forms. The proof is left to 
the exercises, 


Theorem 11.3.7. If U is a convex set and @ is a closed 1-form on U (dd = 0), 
then ¢ is exact ($= df for some C2 f on U). 


Remark 11.3.8. We may summarize the relationship between the exterior dif- 
ferential operation d and its classical counterparts for vector functions on R° as 
follows: if f is a function, ¢ is a l-form with component vector field F, and w is a 
2-form with component vector field G, all defined on an open subset of R®, then 
(1) the component vector field of df is grad f; 
(2) the component vector field of dé is curl F; and 


(3) the coefficient function of dw is div G. 


Transformation Laws for 2-Forms and 3-Forms. If H : U -+ R™ is a function 
defined on an open subset U of R¢, then we would like 2-forms and 3-forms to trans- 
form under H in a way which is consistent with our earlier rules for transforming 
functions and 1-forms and in a way that preserves wedge products. This leads to 
Definition 11.3.9. With H as above, if @ = )0,.; fijdvi A da, is a 2-form and 
w= Vicjcr figrdvi \ dj A dy is a 3-form defined on a set containing H(U), then 
we define H*(¢) and H*(w) as follows: 


H* (0) = 30H" (fis) HH" (dei) AH (der5), 
i<j 


SO A (fig) H* (dei) \ H* (dj) A H* (dary). 
i<j<k 


A 


Of course, we may define p-forms on U for any non-negative integer p, not 
just for p = 0, 1,2,3. The appropriate transformation law for such a p-form under 
H:U ~+R” is the obvious extension of the above laws for p < 3. Note that if p is 
greater than the dimension of the underlying space, then 0 is the only p-form. 


In the following theorem, parts (a) and (b) follow immediately from the defi- 
nitions and part (c) has a simple proof which is left to the exercises. 


336 11. Vecter Calculus 


Theorem 11.3.10. Let and y be two differential forms on an open set V in RY 
and let f be a function on V. If U is an open subset of R? and H:U — Bisa 
smooth function, then 

(a) H*(f¢) = H*(f)H"(9); 

(b) H*(@ Aw) = H*(¢) A H(i); and 

(c) H7(d9) = aH"(¢). 
Example 11.3.11. If U is an open subset of R?, H : U + R® is a smooth trans- 
formation, and (x,y) = f(#,y) dx A dy is a 2-form defined on H(U), then find an 
explicit expression for H*(). 

Solution: As in Remark 11.2.12, we may think of H as a change of variables 
cahi(u,v), y= ho(u,v) 


and simply replace « and y by hy(u,v) and ho(u,v) in f(«,y) and in dx and dy. 
This leads to 


Ohy Oh Oh Oh 
dx = ly = 7, 
n= Grdut Sidu, dy=Gedu+ > dv, and 
7 _ (hy Aho Ohi Oho : 
ar (Feo - oon 
= det(dH) du A dv. 


More precisely, de A dy, when expressed in the u,v coordinates, becomes 
H* (dx A dy) = det(dH) du A dv. 
Since H*(f) = f 0 H, we conclude that 
H*(¢) = H*(f)H"* (dx A dy) = f o H det(dH)du A dv. 
Example 11.3.12. If U is an open subset of R*, H : U > R® is a smooth trans- 
formation, and 
(2, ys 2) = fila, y, 2) dy Adz + fa(x,y, 2) dzA de + f3(x,y, 2) dx A dy 

is a 2-form defined on H(U), then find an explicit expression for H*(¢). 

Solution: If H(u,v) = (hi(u,v), ho(u,v),h3(u,v)), then we may think of H 


as defining a change of variables 


e=hy(u,v), y=ho(uv), 2 =hs(u,v). 


Then 
ah, | Oh Oka , , Ohe ahs, . dhs 
= OP ai aye ee ae = dv. 
de = du Adv, dy=F2dut Sav, de= Te du+ Fav 
If we set 
dh; Oh, 
(hi, hj) du Ou 
Reni) < det | Qu Ou |, 
Au) | Dh, hy 
av Ov 
we conclude that 
O(hg, hs) O(hs,h1) O(hy, he) 
H*(¢) = H H H duh dv. 
(6) = |e GEN + Une GE + Ufs0 Grn du A dv 


11.3. Differential Ferms ef Higher Order 337 


This can also be written as 
OH | OH 


(11.3.3) H*($) =(FoH)- Ee a 


= du dv, 
on oH 
if F denotes the vector function F = (fi, fo, f3) and —— and —— denote the vector 
functions obtained by taking the partial derivatives of the Os sonen functions 
of H. 
Example 11.3.13. If ¢ = xdyAdz + ydzAdx+zdzA dy and H : R? > R® is 
the transformation H(u,v) = (u,v,u? + v*), then find H*(9). 
Solution: We express the transformation H as a change of variables 
rau yoru, zauto? 
Then dx = du, dy = dv, and dz = 2udu + 2udv. Thus 
dy \dz=—2uduAdv, dzANdx=—2vduAdv, dx Ady =duAdv. 
Hence, H*($) = (u,v, u? + v?) - (—2u, —20, du A du = —(u? + v2) du do. 
Finally, there is a composition law for transformations of forms: 
Theorem 11.3.14. if Hi: U + V and Hy: V + W are smooth functions between 
open sets, and if is any differential form defined on W, then 
(Hz 0 H1)*(¢) = Hi 0 H3(¢). 
Proof. It follows from the previous theorem that it is enough to check this in the 


case when @ is a function f or the differential of a function (such as the differential 
dx; of one of the coordinate functions «;). In the case of a function f, we have 


(Hz 0 Hy)*(f) = fo (H20 M1) = (fo He) 0 My = Hy (f o He) 
= Hy(H3(f)) = Hy ° H3(f). 
In the case when ¢ is the differential df of a function, we have 
(Hz 0 Hh)" (df) = a(f 0 (Hz 0 H1)) = d((f 0 He) 0 Hi) 
= Hy (d(f 0 He)) = Hy (H3(df)) = (Hf 0 H3)(df). 
This completes the proof. a 


———— SS 
Exercise Set 11.3 


x? dx + xy dy and y = ydx + x3 dy, then find dé and AW. 


2. If ¢ = coszdx + sin zdy + eydz and yp = zde +x dy + ydz, then find dé and 
One. 

3. Ifo = yede +azdy +aydz and w = zdx Ady +xdy Adz +dz Adz, then find 
dé and @¢Aw. 

4, Prove Theorem 11.3.1 parts (a), (b), and (c). 

. Prove Theorem 11.3.1 parts (d) and (e). 

. Prove part (b) of Theorem 11.3.6. 


ana 


338 11. Vecter Calculus 


7. Prove Theorem 11.3.7. Hint: Fix a point a € U and then define f(r) to be 
the integral of @ along the line [a, 2]; show that @ = df by using the condition 
dé = 0 and integration by parts. 

8. Show that Theorem 11.3.7 does not hold if we don’t put some restriction on 
the domain U. In fact show that if 


o= dr + —* dy on U ={(2,y) € R?:1/2 <|\(e,y)|| < 2h, 
ary 


Sy 
x+y? 


then ¢ is closed but not exact on U. Hint: Use the result of Exercise 11.1.10. 
9. Prove Theorem 11.3.10(a) in the case where ¢ is a 2-form or a 3-form in R®. 
10. Prove Theorem 11.3.10(b) in the case where ¢ is a 1-form and 4 is a 2-form in 

R’. 


11. Prove Theorem 11.3.10(c) in the case where ¢ is a 2-form in R°. 


oH aH 
12. Prove that the vector <~ x “ that appears in (11.3.3) is perpendicular to 


my 
the surface H(U) at each point (u,») of this surface. 


11.4. Green’s Theorem 


Green’s Theorem relates certain double integrals over a region in the plane to path 
integrals over the boundary of the region. It has a wide variety of applications 
and it generalizes nicely to higher dimensions. In this section, we prove Green's 
Theorem for fairly general regions. We begin with the case where the region is a 
rectangle. 


Green’s Theorem on a Rectangle. In its simplest form, Green’s Theorem 
follows from two applications of the Fundamental Theorem of Calculus ~ one in the 
x-direction and one in the y-direction. 


Theorem 11.4.1. Let ¢= f dx+gdy be a1-form on the rectangle R = (a, b] x [c,d] 
and suppose dq exists and is continuous and bounded on the interior of R. Then 


a ”) aV(e,y) = | ” 


where OR is a path which traces the boundary of R once in the counterclockwise 
direction. 


Proof. We begin by breaking up the double integral on the left and expressing 
each of the resulting terms as an iterated integral using Fubini’s Theorem: 


ag ord af 
= Say aray— ff x,y) dydz. 
ar ! ted oy 


11.4, Green’s Theerem 339 


The hypotheses on @ ensure that the Fundamental Theorem of Calculus applies to 
the inner integral in each of the latter iterated integrals. This yields 


ad -b 
/ (o(b,¥) ~ 9(a,y)) dy - i (F(@,4) ~ f (2,0) de 


= [oenart [ reaaes [ oanan [ feevae 


-f b, 
Jar 


where OR is the path obtained by joining together the four straight-line paths along 
the edges of R in such a way that the resulting path traverses the boundary of R 
once in the counterclockwise direction. a 


In the following example, we use Green’s Theorem to avoid parameterizing four 
different sides of a rectangle R in order to compute a line integral around OR. 
Example 11.4.2. Find [,,(y? de + yma dy) if R = [1,2] x [0,1]. 

Solution: By Theorem 11.4.1 


ii (y* de + ylna dy) = i (y/w — 2y) dV(a,y). 
oR R 


By Fubini’s Theorem, the latter integral is equal to the iterated integral 


a fe : In2 
[ [ (y/« — 2y) dady = if (yln2 — 2y)dy = ——-1. 
ei e 2 


Integration of 2-Forms in the Plane. It will be helpful to interpret the integral 
of a function over a region in the plane as an integral of a certain 2-form. 

If A is a compact Jordan region in the plane, every 2-form on A is of the form 
f dx A dy where f is a continuous function on A. We define the integral of such a 
2-form over A to be 


(11.4.1) [ tenaenay = f senav (en. 


That is, it is the ordinary Riemann integral in two variables of the function f over 
the set A. The advantages of using the 2-form notation in the integral will become 
apparent below. 


In Example 11.3.2 we showed that if ¢ = f dx +9 dy is a differentiable I-form, 


then sous, 
Og Of 
Bp (8 OL Vendy, 
we (z an) a ¥ 
This, together with the above 2-form notation for integrals in R2, allows us 
to rewrite the left side of the equality in Theorem 11.4.1 as {pd. Then Green's 
Theoveni on‘ fectaugle becomes 


Theorem 11.4.3. If ¢ is a1-form defined on a bounded rectangle R and if dd is 
continuous and bounded on the interior of A, then 


[w= f o. 
JR J@R 


340 11. Vecter Calculus 


Proof. By (11.4.1) and the previous theorem, we have 


f= ferme [(B-Bw-Le 


Change of Variables for Integrals of 2-forms. Using the 2-form notation for 
integrals in R? also turns the change of variables formula for such integrals into a 
natural formula involving the transformation law for 2-forms, as discussed in the 
previous section. 


Theorem 11.4.4. Let H = (hj,h2) be a continuous transformation from the open 
Jordan region U in R2 to another Jordan region in R2 and suppose H is one-to-one 
and smooth with non-singular differential on U. If @ is a 2-form on H(U) with ¢ 
bounded on H(U) and H*(@) bounded on U, then 


fie  L#o 


provided det(dH) > 0 everywhere on U. If det(dH) <0 on U, equality holds if the 
right side of the equation is replaced by its negative. 

Proof. Let 4(x,y) = f(«,y) de Ady. By Example 11.3.1, 

(11.4.2) H*($) = foH det(dH) dua dv. 


Recall that the differential dH of the transformation H is the linear transformation 
with matrix 


Oh, Oh, 
Qu av 
Shy ha 
Ou Ov 


The hypotheses of the theorem ensure that the change of variables formula (The- 
orem 10.5.14) applies. If the determinant det(d/) is everywhere non-negative on 
U, then it implies 


i f(a, y) der ay= f four) det(dH) (u,v) |du A dv 
H(U) U 


(11.4.3) 
= [ fo H(u,v) det(dH)(u,v) du A dv = [ H*(6). 
JU U 
That is, 
| o= [ H"(d). 
'H(U) U 

If det(d/1) is everywhere non-positive, then |det(d/)| = — det(dH) and the right 
side of thé above equation is replaced hy its negative. O 


2-cells. In order to extend Green's Theorem to a much larger class of integrals, 
we need to change our point of view regarding integrals of 2-forms. We have dis- 
cussed in previous sections the integration of 1-forms over paths. A path is not a 
set, but rather a function from an interval into R®, although we sometimes ignore 
the distinction between the path and the set which is its trace in R#. There is 
a similar and highly useful formulation for integration of 2-forms. We define the 


11.4, Green’s Theerem 341 


integral of a 2-form over an object which is not a set, but rather a two-dimensional 
analogue of a smooth path. A 2-cell, as defined below, is such an object. 


, 1] x [@, 1) in R?. The bounday 
path OJ? is the path consisting of the straight-line paths along the edges of I? joined 
together so as to traverse the topological boundary of J? in the counterclockwise 
direction. 


In what follows, J? will denote the square 


We will say the function E : ? + R¢ is smooth on J? if each of its first-order 
partial derivatives exists and is continuous on [?. It is clear what this means on the 
interior of [?, On each edge and corner of I? one or both of the partial derivatives 
must be interpreted as a one-sided derivative. Thus, at each point of I? we require 
that the appropriate one- or two-sided derivative exists and we require that the 
resulting functions on J? be continuous. With this understanding, we make the 
following definition. 


Definition 11.4.5. A 2-cell in R® is a smooth function E from J? into R%. 
We will say that a 2-cell E is simple if, on the interior of [?, E is one-to-one 
and det(dE) is non-vanishing. If, in addition, det(dE) > 0 on the interior of [?, we 


will say that B is positively oriented. We will say that E is negatively oriented if 
det(dE) < 0 on the interior of I?. 


In this section we will only be concerned with 2-cells in R2. In the next section, 
2-cells in higher-dimensional spaces will become important. 

Note also that the conditions on a cell E ensure that the restriction of E to 
each of the four edges of OI? is a smooth curve and, hence, that OE = EF 0 JI? is 
piecewise smooth ~ that is, it is a path. We will call this the boundary of the cell E. 


The image E(I*) of a 2-cell E is called the trace of E. As was the case with 
curves and paths, a 2-cell consists of not only the set E(I?) but also a parameteri- 
zation F of that set, with the parameters being the coordinates of points in [?. 


In general, a path may cross itself, retrace portions of itself, or even be constant 
over portions of its parameter interval. However, a simple path can do none of these 
things. A simple path is one-to-one and has non-vanishing derivative on the interior 
of its parameter interval. Similarly, a simple 2-cell is one-to-one with non-singular 
differential on the interior of [?, 


Note that if F is a 2-cell, then OF is a path, not a set, and so it is not the same 
thing as the topological boundary of E(JI?) even though we use the same notation 
to denote it. Which is meant should be obvious from the context. Sometimes the 
trace of OF is the same as the topological boundary of the trace of E, but not 


always (see Figure 11.4.1). 


Orientation for Paths. A path which traverses the boundary of a set such as 
a square or circle in the counterclockwise direction has a property which can be 
generalized in a useful w 


An ordered basis in R? is a linearly independent ordered pair {u,v} of vectors 
in R®. An ordered basis is said to be positively oriented if the angle @ between the 
two vectors, measured from wu to v, satisfies 0 < 0 < 7 — that is, if sin@ > 0. Think 


of this as meaning that v points to the left of u. This happens if and only if the 


342 11. Vecter Calculus 


oe 
* 
' 
4 ' 
‘ 
> 
a 
= 
A B 


Figure 11.4.1. Simple, Positively Oriented Cells in R?. 


determinant of the matrix with u as first column and v as second column is positive 
(Exercise 11.4.8). 

At each smooth point of OJ? (at points which are not corners), the tangent 
vector T to the path is defined. Furthermore, if v is any vector for which (T,v) is 
a positively oriented ordered basis, then tv belongs to J? for all sufficiently small 
positive t. In other words, the set J? lies on the left as we traverse OJ*. It turns 
out that this property is preserved by a positively oriented 2-cell, due to the fact 
that dE takes a positively oriented basis to a positively oriented basis. That is, at 
each smooth point a of the path OE, if the tangent vector T to the path at a anda 
vector v form a positively oriented pair (T,v), then each sufficiently small positive 
multiple of v lies in E(I?) (we won't prove this here). Intuitively, this means that 
as we traverse OE, the set E(I?) lies on the left (see Figure 11.4.1). If the cell is 
negatively oriented, the set E(J) lies on the right as we traverse the path OF 
that is, the orientation of the boundary path is reversed by E. 


Example 11.4.6. Give an example of a simple, positively oriented 2-cell which 
has as its trace the unit disc D = {(a,y): 2% +y? < 1}. 
Solution: There are many ways to do this. One way is to use the polar 
coordinate parameterization: 
E(r,t) = (rcos(2zt),rsin(2nt)) for (r,t) € I?. 
This is illustrated in Figure 11.4.1B. We have 
ap — (cos(2mt) —2nr sin(2nt) 
~ \sin(2rt) — 2ar cos(2zt) 


and this has determinant 2mr, which is positive on the interior of I?. This parame- 
tel 
oriented 2-cell. 


tion is clearly one-to-one on the interior of [?. Hence, F is a simple, positively 


Note that part of the trace of the boundary 9E of this cell does not actually 
lie on the boundary of the trace of E, but in its interior, and this part of the trace 
of OF is traversed twice — once in each direction. Also, over part of its parameter 


11.4, Green’s Theerem 343 


interval, OF is constant (the part corresponding to the side r = @ of 9/2), Our 
definition of a simple cell does not rule out this kind of behavior. 


Integration over a Cell. Just as we defined the integral of a 1-form over a path 
in Section 11.1, we may now define the integral of a 2-form over a 2-cell. 


Definition 11.4.7. If F is a 2-cell in R? and w = f de A dy is a 2-form defined on 
the trace of F, then we define the integral of w over E to be 


fe-[re 


Note that the integral on the right in this definition exists. To see this, let 
w = f dx A dy and E(u,v) = (e1(u,v), e2(u, v)). Then 


Bu) = for (Sate Sao) du Adv. 


Ou Ov Ov Ou 
By the definition of a 2-cell, the function multiplying du A dv in this expression is 
continuous on 1?. 


Integration over a Simple Cell. The image of the interior of J? under a simple 
cell E is an open subset of the trace of E by Exercise 9.6.8. It follows that the 
boundary of the trace of E is contained in the trace of OE. The trace of a path has 
zero area (Exercise 11.4.7). This implies that the trace of OE has zero a 
hence, that the trace of E and the image under E of the interior of I? are Jordan 
regions which differ by a set of area @. Furthermore, a simple cell, restricted to the 
interior of [?, satisfies the conditions of the change of variables formula given in 


Theorem 11.4.4. This leads to the following theorem: 


rea and, 


Theorem 11.4.8. If E is a simple, positively oriented 2-cell with trace A = E(I?) 
and if w = f dx Ady is a2-form defined on A, then 


Proof. This follows immediately from Theorem 11.4.4. Oo 


Thus, in this case — the case of greatest interest — the integral of the form w 
over the cell F is just the integral of a function f over a Jordan region A. 


Change of Parameter. Just as with integrals of 1-forms, there is a sense in 
which the integral of a 2-form over a 2-cell is independent of the parameterization 
of the 2-cell. If E and F are 2-cells, then we will say that F is related to E by a 
smooth change of parameter if there is a smooth one-to-one function H from the 
interior of J? to itself, with non-singular differential, such that F = Eo H on the 
interior of J?. The smooth change of parameter H is said to be positively oriented 
if det(dH) > @ on the interior of J? and negatively oriented if det(dH) < @. 


Theorem 11.4.9. If E and F are 2-cells which are related by a smooth change of 
parameter H in the above sense, then 


an 


344 11. Vecter Calculus 


if H is positively oriented and w is any 2-form defined on E(I?). This equation 
holds with the right side replaced by its negative if H is negatively oriented. 


Proof. We have 


fe=femem=frews= few= fx 


by Theorem 11.3.14 and Theorem 11.4.4. Oo 


Green’s Theorem on a Cell. We can now extend Green’s Theorem to integrals 
over a 2-cell. 


Theorem 11.4.10 (Green’s Theorem). [f E is a 2-cell in R? and ¢ is a smooth 
1-form on a neighborhood of the trace of E, then 


| o= | dd. 
an E 
Proof. We have 


Lo- [Pos faew= [eae = fae, 


by Green’s Theorem on a rectangle and Theorem 11.3.10(c). a 


Remark 11.4.11. The cell E in the above version of Green's Theorem is not 
required to be positively oriented or even simple. Thus, the path OE may not be 
positively oriented and the integral of @ = f dx A dy over E may not be the usual 
two-dimensional integral of f over the trace of E (it will be its negative if E is 
negatively oriented). On the other hand, if FE is simple and positively oriented, 
then the integral on the right is the usual two-dimensional Riemann integral of f 
over the trace of E by Theorem 11.4.8. 


Remark 11.4.12. The concept of a cell E and its boundary OF are useful in stat- 
ing and proving Green’s Theorem. In actually computing one side or another of the 
equality in Green’s Theorem it is often convenient to switch to a different param- 
eterization of the trace of F or OE. This is legitimate as long as the appropriate 
change of parameter theorem applies. The idea is to choose a parameterization 
that makes the computation of the integral as easy as possible. In fact, we do this 
in each of the following examples. 


Example 11.4.13. Let A be the compact set bounded by the ellipse described 
parametrically by 7(t) = (acost, bsint), 0 < t < 2m. Use Green’s Theorem to find 
the area of A. 

Solution: The 2-form dz A dy is dé, where ¢ = ady. Along 7, « = acost, and 
dy = beostdt. Thus, by Green’s Theorem, the area we seek is 


ar 
[arnd= foau= [ abcos? t dt = rab. 
A 1 0 


Note that the set A is the trace of a 2-cell (Exercise 11.4.3), but we do not need to 
explicitly find the cell E ; [2 -+ A that expresses it as such. If we did find such an 
F, it is unlikely that the path 7 that we used here would be exactly equal to OF. 


11.4, Green’s Theerem 345 


Figure 11.4.2. The Annulus as a Cell. 


, 7 and QE will necessarily be equivalent paths, provided E is chosen so 
that OF is a path which traverses 0A once in the positive direction. 


Often the topological boundary of the trace of a cell E is not OF and, in fact, 
it may not even be the trace of a single path. It could be the union of the traces of 
several paths. Properly interpreted, Green's Theorem still applies, but the integral 


over the boundary is the sum of integrals over these several paths. The annulus in 


the following example illustrates this fact, among other things. 
Example 11.4.14. For the annulus 
A={(x,y):1S a? +y? <4}, 


show that the integral over OA of the 1-form 


y x 


o=->y pt + So 
tye | wy? 


dy 


is 0 by using Green’s Theorem. Then directly calculate the integral of ¢ over the 
circle 2? +y? = 4. Why doesn’t Green’s Theorem also imply that this integral is 0? 


Solution: Figure 11.4.2A illustrates how to express the annulus as the trace of 
a cell (finding an explicit parameterization that does this is Exercise 11.4.5). The 
boundary path of this cell has two overlapping horizontal sections that are oriented 
in opposite directions. The integrals along these sections will cancel each other, 
leaving only the integrals around the two circles which comprise the topological 
boundary of A. One of these is traversed counterclockwise and the other clockwise 
(Figure 11.4.2B). To calculate the resulting integral of ¢ along 0A, we note that 


2 2 


yaw 
dp = apart Ne + 


y?) 
| a= [a=0. 
eA A 


Thus, by Green's Theorem, 


346 11. Vecter Calculus 


On the other hand, a direct calculation of the integral of ¢ over the outer circle 
wty=4 can be done using the parameterization 7(1) = (2cost, 2sin#) of this 
curve on (0, The result is 


Qa 
y ee 2 _ 
i (-wipttat ws aa) = i (sin? ¢ + cos? t) dé = 2n. 


If Green's Theorem applied, the integral would be 0, since dg = 0. The reason why 
Green’s Theorem does not apply in this case is that the circle 2? + y? = 4 is not 
the boundary of a set on which ¢ is a smooth 1-form. The form ¢ has a singularity 
at (0,0). On the other hand, the point (0,0) is not in the annulus A and so it does 
not cause a problem in applying Green’s Theorem to A and 0A. 


Classical Version of Green’s Theorem. If ¢ = Pdx + Qdy is a differential 
2form and if y = (41,72) : J + R? is a path in the domain of ¢, then 


[o= [ecw -voa= [room +eowrola. 
Jy JT 1 


Classical notation for this integral is as follows: the differential form ¢ has compo- 
nent vector field F = (P,Q). The tangent vector to the curve 7 is T = 7'/|\7'II- 


We write 
[o- [F Tas, 
Uae 


where ds = ||7/(J)||dé is the differential of length along the path 7. 

By Remark 11.3.8, if ¢ is a 1-form in R® with component vector field F, then 
dé = curl F. The same statement holds in R? if the curl of a vector field (P,Q) is 
understood to be 0Q/dx — OP/dy. 

With this notation, the classical version of Green’s Theorem is as follows: 
Theorem 11.4.15. Let A be a closed Jordan region in R® with topological boundary 
which is the image of a path OA, positively oriented with respect to A. If F is a 
smooth vector field on A, T is the vector function which is the tangent vector to OA 
at each point of OA, and ds is the differential of arc length along OA, then 


| F.Tas= ff curl FdV. 
oA A 


In this section, we have essentially proved this version of the theorem in the 
case where A is the trace of a simple, positively oriented cell. Our proof also yields 
a proof of the above theorem in the case where A can be cut up into finitely many 
pieces which are traces of simple oriented cells (see Exercise 11.4.13). 


Example 11.4.16, Find [,, F+Tds for the function F(r,y) = (cos(In |x|) + y, cy?) 
and the curve C = {(x,y) € R?: 2% +y? = 1}. 


Solution: Green’s Theorem tells us that the above integral is the same as 


a a ‘ 
Oo ABA O Bane i, Bytes 1) aV(a.y). 
dio (= pylcontne) +0) dV (x,y) Low” 1) dV (x,y) 


We calculate the latter integral using polar coordinates. The result is 


Qa 1 Bs F 
[ [ (73 sin? @ — r) drd@ = —30°/4. 
Jt Jt 


11.4. Green’s Theerem 347 


10. 


che 


12. 


13. 


14. 


eT 
Exercise Set 11.4 


. If R is a rectangle of width a and height b, then use Green’s Theorem to find 


lon. 


. Use Green’s Theorem to find f5;.(y?« dx + «?y dy). 
. Show that « = acos(mt),y = b(2s — 1)sin(zt), (s,t) € I? gives an explicit 


parameterization as a simple, orientation-preserving 2-cell E for the ellipse A 
of Example 11.4.13. Show that JE traverses 0A once in the positive direction. 
Explain why this path yields the same integral for a 1-form on A as does the 
path 7 of the example. 


. Using the parameterization E given in the preceding exercise, calculate the area 


of the ellipse of Example 11.4.13 by directly calculating J, de A dy. 


. Find an explicit parameterization for the 2-cell in Figure 11.4.2A that has the 


annulus of Example 11.4.14 as its trace. 


. Use Green’s Theorem to find f,,(y* da — «*dy) if A is the annulus of the 


previous exercise, 


. Prove that the trace of a path in R? is a set of area zero (see Exercises 10.2.6 


and 10.2.8). 


. Verify the claim, made in the discussion of orientation for paths, that an ordered 


pair {v, w} of vectors in R? forms a positively oriented basis if and only if the 
matrix with v as first column and w as second column has positive determinant. 


. Prove that a 2x2 matrix takes a positively oriented basis to a positively oriented 


basis if and only if it has positive determinant. 


Use Green’s Theorem to calculate [,,,(xy dx + (x + In(2 + y)) dy), where D is 
the unit disc. 

If F isa simple positively oriented cell in R2, with trace A, find a formula which 
expresses the area of A as an integral around QE. Is there more than one way 
to do this? 

Use the result of the previous exercise to find the area of the region in R? 
enclosed by the path « = cost, y = sin2t, —t/2<t < 7/2. 

Suppose A and QA satisfy the hypotheses of Theorem 11.4.15. Suppose that 
A may be written as the union of finitely many sets, of the form B; = Im(B)) 
where each E; is a simple positively oriented cell and any two of the sets Bj 


intersect only at common boundary points. Explain why it is reasonable to 
think that the sum of the integrals of a 1-form ¢ along the paths OB; is equal 


to the integral of ¢ along 0A. 


Let U be an open set in R? and let @ and b be points of U. We say that two 
paths 7 and 7; both of which begin at a and end at b are homotopic in U if 
there is a cell E : [2 + U such that E(s,0) = a, E(s,1) = 6, E(0,t) = y(t), 
and E(1,t) = y(t) for all s,t € 0, 1]. Show that if ¢ is a 1-form with dé = 0 


on U, then 
[ols 
% an 


348 11. Vecter Calculus 


whenever 7 and 7 are homotopic paths joining a to b. Conclude that if any 
two paths joining the same two points of U are homotopic and if dé = @, then 
[,@ depends only on the endpoints of 7 and not on the path joining these 
endpoints. 

15. Show that if U is a convex open subset of R? and if a and b are points of U, 


then any two smooth paths joining @ to b are homotopic in U (see the previous 
exercise). 


11.5. Surface Integrals and Stokes’s Theorem 


This section is devoted to the study of integration on two-dimensional surfaces in 
R¢ and to generalizations of Green’s Theorem to this context. 

We begin with a discussion of integration over parameterized surfaces. We dis- 
cuss the concepts of surface area and orientation for parameterized surfaces and 
prove that these notions are essentially independent of the choice of parameteriza- 
tion. We then specialize to the case where the parameterized surface is a 2-cell in 
R¢ and prove Stokes’s Theorem. This is a generalization of Green’s Theorem to 
the case where the 2-cell has its trace in R? for d > 3. 


In the next section we will generalize Green’s Theorem to the case of a 3-cell 
in R3 (Gauss’s Theorem) or, more generally, a 3-cell in R¢ for d > 3. 

These results do not require many new ideas. Most of what we need has already 
been encountered in our study of Green’s Theorem in the previous section. 


Not every geometric object that we might wish to integrate over can be ex- 
pressed as the trace of a cell. To exploit the full power of these theorems, we will 
need to consider objects which are constructed by piecing together cells ~ much as 
we dealt with piecewise smooth paths in previous sections. This will be done in the 
final section of this chapter. 


Integration over a Parameterized Surface. A smoothly parameterized surface 
is the two-dimensional analogue of a smooth path. 


Definition 11.5.1. A parameterized 2-surface in R@ is a continuous function H : 
U — R¢ from an open set U C R? into R4. It is a smoothly parameterized surface 
if H is one-to-one and smooth, with a differential dH which has rank 2 at each 
point of U. The image of a smoothly parameterized 2-surface is called its trace. 


The definition given here differs slightly from Definition 9.4.7 in that, here, a 
specific parameter function H is part of the definition. 
The integral of a 2-form over a smoothly parameterized surface follows the 


pattern of the definitions of integration of 1-forms over paths and of 2-forms over 
2-cells in R2. 


Definition 11.5.2. If U is a Jordan region in R®, H : U + R¢ (d > 2) isa 
smoothly parameterized surface in R¢, and w is a 2-form defined on A = H(U) 


11.5. Surface Integrals and Stekes’s Theerem 349 


with H*(w) bounded on U, then we define the integral of w over H to be 


I, eo Ie Oe: 


The condition that H*(w) be bounded in the above definition is needed to 
ensure that the integral on the right exists (the continuity of w and smoothness of 
H ensure that H*(w) is continuous). Note that if F is the component vector field 
ofw, then H*(w) is a 2-form on U which is the inner product of Fo H with a vector 
consisting of determinants of 2 x 2 submatrices of dH (see Example 11.3.12). It 
follows that the condition that H*(w) be bounded in the above definition will be 
satisfied if dH and w are both bounded. 


Remark 11.5.3. Often the parameter function H will actually be defined and 
continuous on a compact Jordan region A with U as its interior and the 2-form w 
will be continuous on H(A). This guarantees that w is bounded on H(U). If dH 
extends to be continuous on the compact set A, then we are also guaranteed that 
dH will be bounded. Note that, in this case, it does not matter whether or not the 
integral on the right above is taken over U = A° or over A, since 0A has area @ (A 
is a Jordan region). 

Example 11.5.4. If A = {(x,y):« > 0,y > 0,2+y < 1} and the parameterized 
surface H : A + R® is defined by H(x,y) = (#,y,x —2y +5), then find fy, if 
w= —ydy Adz +2dz Adz. 


Solution: In this example, the parameterization H actually expresses the 
surface as the graph of a function defined on the triangle A. That is, H expresses 
the variables (, y, 2) in terms of (x,y) by « =a, y = y, 2=a—2y+5. Under H’, 
the differentials dx and dy remain unchanged, while H* (dz) = dx — 2dy. Thus, 


H*(w) = —ydy A (dx — 2dy) + & (dx — 2dy) A dx = (2x + y) de A dy. 


1 play 
[ w= [e+ uaendy = | ; (2a + y)dady = 1/2. 
JH U 0 Jo 


Parameter Independence. 


Thus, 


Definition 11.5.5. Let H :U + R@ and J : V + R@ be smoothly parameterized 
surfaces. If P : V + U is a smooth one-to-one function with det(dP) either strictly 
positive or strictly negative on V, then we will say that P is a smooth parameter 
change from H to J provided H = Jo P. If det(dP) > @, we will say that P is 
positively oriented, while if detdP < @, we will say that P is negatively oriented. 


Note that if there is a smooth parameter change from H to J, then H(U) = 


J(V). That is, H and J have the same trace. 


The theorem on independence of parameterization (Theorem 11.4.9) holds in 
this more general context. The proof is the same. 


Theorem 11.5.6. Let H : U  R®% and J : V — R® be smoothly parameterized 
surfaces. If there is a smooth parameter change from H to J, then 


ee 


350 11. Vecter Calculus 


if w is any bounded 2-form on H(U) = J(V). The sign in this identity is positive 
if P is positively oriented and it is negative if P is negatively oriented. 


This theorem often allows us to simplify an integration problem by choosing a 
more convenient parameterization than the one given. 


Example 11.5.7. Find a smooth parameter change which expresses the integral 
in Example 11.5.4 as an integral over a square rather than a triangle. Then do the 
integration. 


Solution: We set P(u,v) = (u,(1—u)v). Then P is a one-to-one function 
from the interior of the square [? onto the open triangle A and its differential is 


1 0 
ae=(1, 124): 


which has determinant 1 —u. This is positive on the interior of J? and so it 
determines a positively oriented smooth parameter change. Since we have H(,y) = 
(x,y,a — 2y +5), the new parameterized surface J = Ho P is 
J(u, v) = (u,(1—u)v,u— 2(1 — u)v +5) = (u,v — uv, u — 2 + 2Quv + 3), 
that is, the surface obtained by setting «=u, y =v—uv, z= u—2vt Quo +3. 
Then dx = du, dy = —vdu + (1—u)dv, and dz = (1 + 2u)du + 2(u—1)dv. Since 
w= -ydy Adz +xdz A dz, this implies 
J*(w) = —(v — uv)(—vdu + (1 = u) dv) A ((1 + 2v) du + 2(u — 1) dv) 
udu ((1 + 2v)du + 2(u—1) dv) 
= ((v—2)u? + 2(1 —v)u+v)duA dv. 


The integral of w over J is then 


[- 3 { J*(w) = : [o- 2)u? + 2(1—v)utv) dudv = 1/2. 


This is not a case where changing the parameterization simplifies the integration. 


Orientation. A smoothly parameterized surface EF comes equipped with a nat- 
ural orientation. What do we mean by this? It will turn out to be important. 


We begin by discussing the concept of orientation for R?. The ordered pair of 
vectors (1,0), (0, 1) is an ordered basis for this vector space. If we choose another 
ordered pair of basis vectors (a,b), (c,d), then 


(690) 
)-6:90) 
s=(69) 


transforms the ordered basis (1,0), (0, 1) to a new ordered basis (a,b), (c,d). 


and 


Thus, the matrix 


Now the matrix A must be non-singular since (a,b) and (c,d) are linearly 
independent. This means that det A # 0. However, det A may be positive or it 


11.5. Surface Integrals and Stekes’s Theerem 351 


may be negative. This means that the possible ordered bases for R2 fall into two 
classes — those for which det A is positive and those for which det A is negative. 
A pair of ordered bases that fall into the same class are said to have the same 
orientation while a pair which fall into different classes are said to have opposite 
orientation. If we fix an ordered basis, then any other ordered basis is said to have 
positive orientation or negative orientation (relative to the fixed ordered basis) 
depending on whether or not it has the same or the opposite orientation of that of 
the original basis. 


Example 11.5.8. For the following ordered pairs of basis vectors, tell which have 
the positive orientation and which have negative orientation with respect to the 
standard ordered basis (1,0), (®, 1): 


(1) (0,1), (1,0); 
(2) (@,-1), (1,0); 
(3) 1), (-1 1). 
Solution: We have 
el e@ -1 i ee 
eerie eagea ye 


Thus, the first pair has negative orientation while the second and third pairs have 
the positive orientation with respect to the standard pair. 


Of course specifying a coordinate system for the plane as well as a choice of 
ordering of the coordinate axes is the same as specif’ 
orientation of the plane is determined by a choice of an ordered coordinate 


ing an ordered basis. Thus, an 
tem. 


Specifying an orientation on the plane is also equivalent to specifying a positive 
direction of rotation about a point. A non-zero rotation of magnitude less than 1/2 
is positive if it moves the positive c-axis toward the positive y-axis. 


Surfaces and Orientation. A smooth p-surface S$ in R? is a subset of R? which 
is locally a smoothly parameterized p-surface. This means that at each point s € S 
there is a neighborhood U of s in R? such that SOU has a smooth parameterization. 


Our main concern in this section is with 2-surfaces. They will be referred to 
simply as surfaces. 

A smoothly parameterized 2-surface has a natural orientation. That is, if H : 
U = RP is the map which parameterizes the surface S and if a € U, b = H(a), then 
the linear transformation dH : R? + R? maps R? onto a two-dimensional linear 
subspace L of R? and it maps the standard basis (1,0), (0, 1) onto an ordered basis 
for L. Note that b + L is the tangent space of § at the point b. This ordered basis 
defines an orientation on L. This is what we mean by the orientation of the surface 
Sat the point 6 = H(a). Because H is smooth, the space L and the ordered pair of 
basis vectors vary in a continuous fashion as the point b moves about the surface S. 


Suppose Hy : R? > R¢ is another smoothly parameterized surface, with image 
S} which is equal to $ in some neighborhood U of b. Then UNS = UNS, are surfaces 
with two different parameterizations. These parameterizations may determine the 
same orientation for the surface at. b or opposite orientations. That is, the notion 
of orientation of a surface at a point depends on the choice of parameterization for 


352 11. Vecter Calculus 


the surface in a neighborhood of this point. This discussion leads to the following 
definition. 


Definition 11.5.9. An orientation of a smooth surface $ at a point b € S is the 
orientation class of a pair of basis vectors for the vector space L, where b + L is 
the tangent space of S at b. An orientation for $ itself is a choice of orientation 
for S at each of its points b in such a way that ordered basis vectors defining this 
orientation may be chosen in a continuously varying fashion as b moves over S. An 
orientable surface is one which may be given an orientation. 


Surfaces in 3-Space. If H is a smoothly parameterized 2-surface S in R® with 
parameter set U and trace S, then the images under dH of the basis vectors (1,0) 
and (0,1) are the first and second rows of the matrix dH. They may also be 
described as the vectors 0H/Ou and OH/Av. They constitute an ordered pair of 
basis vectors for the vector space L such that H(u,v)+ L is a tangent space of S at 
H (u,v). As the points (u,v) range over U, they determine an orientation of S. The 
cross product of these vectors OH /Ou x OH /v is often called the normal vector to 
the parameterized surface and is denoted Ny. This is a vector orthogonal to the 
vectors OH/Ou and OH/0v and it varies continuously with the point (u,v) € U. 
The cross product of any ordered basis of vectors in L will have the same or opposite 
direction as Ny, depending on whether or not the ordered basis determines the same 
orientation as (0H /Ou, 9H /Av). In other words, the direction of Ny at a point on 
the surface determines the orientation of the ordered pair (OH/Ou,0H/Ov) and, 
hence, the orientation of S at that point. The following theorem follows from this 
observation. 


Theorem 11.5.10. An orientation on a surface S in R°® is determined by a con- 
tinuous function which assigns to each point of S a vector orthogonal to the tangent 
space of S at that point. There exists such a function if and only if the surface is 
orientable. 


Most of the common surfaces we deal with in IR? are orientable. This includes 
spheres, cylinders, tori, and any smoothly parameterized surface. However, not all 
surfaces in R? are orientable, as the next. example shows. 


Example 11.5.11. Find a surface in R® which is not orientable. 


Solution: Such a surface is the Mebius band, illustrated in Figure 11.5.1. Note 
that an attempt to continuously assign a normal vector to the points of this surface, 
beginning at the left and proceeding in the counterclockwise direction, results in 
the vectors pointing in the opposite of the original direction once we return to the 
starting point. 


A physical example of a Mobius band may be constructed by taking a long, 
thin rectangular strip of paper, twisting one end through 180 degrees, and then 
glueing it to the opposite end. 


surface in 


H/||Null- 


This appears to depend on the parameterization H and not just on its trace S and, 


Surface Integrals in 3-Space. Let H be a smoothly parameterized 2- 
R® with trace S. The unit normal to the surface A is defined to be N 


11.5. Surface Integrals and Stekes’s Theerem 353 


Figure 11.5.1. A Mebius Band. 


in fact, by definition, it is a function on the parameter set U of H. However, at a 
given point of S, there are only two unit vectors which are orthogonal to the tangent 
plane of S and they point in opposite directions. Thus, if two parameterization of 
S give it the same orientation, then they must determine the same normal vector 
at each point (this also follows from Exercise 11.5.7). In other words, for a smooth 
oriented surface, there is a uniquely defined unit normal vector at each point of 
the surface, For this reason, we consider the unit normal vector to be a function 
of points (2, y,z) on the surface S, rather than a function of points (u,v) in the 
parameter set U. Given a parameterization H of the surface, we recover Ny as 


Ny(u,v) = ||Ni(a,y)||N(H(u,v)) or Ny = ||Na||N oH. 


Surface Area. Just as we defined the arc length s of a path and the integral 
over a path with respect to the differential ds of arc length, we may define the 
area of a parameterized surface and the integral of a function with respect to the 
differential of surface area. 


Definition 11.5.12. If H : U > R® is a smoothly parameterized surface, then we 
define the surface area of the trace S of H to be 


a(s)= | [Nir(u,v) || du A de. 
Uv 


If f is a continuous function defined on S, then we define its integral with 
respect to surface area on S to be 


[fe= | fer= [rome 


This is independent of the parameterization in the sense that if G is another 
smoothly parameterized surface which is related to H by a smooth parameter 
change P, then the integrals in the above definition are unchanged if we replace H 
by G = HoP. This is due to the change of variables theorem (Theorem 11.4.4) and 
the fact that Nyop = det(dP)Ny 0 P (Exercise 11.5.7). This shows, in particular, 
that the surface area of the trace S of H is independent of the parameterization. 


Ni (u,v) ||dwA de. 


354 11. Vecter Calculus 


Let (x,y, 2) = H(u,v), (u,v) € U be a smooth parameterization of a 2-surface 
S in RS, as above, and let = fidy Adz + fodz Ade + fadx A dy be a 2-form defined 
on a neighborhood of the trace of H. By Example 11.3.12, if we let F = (f1, fo, f3) 


be the component vector field of @, then 


a 
(15a) (6) =(Fot) oe 2) aundy =(FoH)- Nadu dv. 
du” Ov 
If we use the notation, do = ||Niz||duAdv, this allows us to express the integral 
of the 2-form ¢ over a smoothly parameterized surface H in its classical form as an 


integral with respect to surface area over the trace S of H: 


(11.5.2) [em [Pr nae. 
H s 


This has physical interpretations in certain situations. For example, if F is the 
velocity field of a fluid moving in R3, then the integral represents the flu or rate 
of flow of the fluid across the surface S. 


Integration over a 2-cell in R*. In the previous section, we defined 2-cells in 
R¢ (Definition 11.4.5) 


We may think of a simple 2-cell in R® as a smoothly parameterized 2-surface in 


R¢ along with a path OE which runs around the edge of this surface. The boundary, 
OE, of a 2-cell E is, as before, the path which is the composition of E with the 
path OJ? in R?. In general, this will not be the same as the topological boundary of 
E(I?). In particular, in dimensions higher than 2, the trace E(I?) has no interior 
and is, therefore, its own topological boundary, whereas OE is just a path in R¢ 
which runs around the edge of E(I?). 

Considered as a smoothly parameterized 2-surface defined on the interior of /?, 
a 2-cell E satisfies the conditions of Definition 11.5.2. Similarly, a 2-form ¢ defined 
on a set containing the trace E(I*) is continuous, hence bounded, on E(I) and so 
it also satisfies the conditions of Definition 11.5.2. Hence, the integral 


exists. It is this surface integral over a 
Theorem. 


2-cell E that we use in formulating Stokes’s 


Stokes’s Theorem. Stokes’s Theorem for two-dimensional surfaces is much like 
Green’s Theorem. The difference is that two-dimensional surfaces lying in R* for 
d > 3 replace regions in R?. The result is still stated in terms of 2-cells, but now 
they are 2-cells in dimension higher than 2. We will be primarily concerned with 
2-cells in R°. 


Theorem 11.5.13 (Stokes’s Theorem). Let E : I? + R¢ be a 2-cell and let 6 
be a smooth 1-form defined on an open set in R® containing E(I*). Then 


Loh 
J@E E 


The proof is identical to the proof of Green’s Theorem (Theorem 11.4.10). 


a 


11.5. Surface Integrals and Stekes’s Theerem 35. 


Remark 11.5.14. As with Green’s Theorem, although Stokes’s Theorem is stated 
in terms of a cell F and its boundary JE, in practice one computes the integral over 
E or OE using a convenient parameterization which may have little to do with E. 


Example 11.5.15. Use Stokes’s Theorem to calculate the integral of the 2-form 
(x + y) dz around the boundary of the surface 


ye—-Qe@+2y, O<a<l, O<yK<l, 


z 


where the boundary path is traversed in the counterclockwise direction when seen 
from above the surface (the positive z-axis points up). 

Solution: We parameterize the surface by setting # = u,y = v,z = uw? — 
ve a + oe That is, we represent the surface as the trace of the 2-cell E(u,v) = 
(u,v, u2—v?—2ut2u), (u,v) € J2. Traversing 07? in the counterclockwise direction 
causes E(u,v) to traverse the boundary of our surface in the required direction. 
Since d((x + y) dz) = dy Adz — dz Adz, Stokes’s Theorem implies that 


hs (etude f (dy nde—dende). 
@E E 


We have di = du, dy = dv, and dz = (2u — 2) du — (2u — 2) du for the parameteri- 
zation determined by E. Thus, 


[erases [a 2u— 20) du nde 


-{ [= 20-20) dude =2. 


Example 11.5.16. Let w = xdy A dz — ydz A dx — 2ydx A dy. Find the integral 
of the 2-form w over _ Horii T described as follows: for each point on the circle 
A= {(x,y,@) € R®: 2? +y? = 4}, let Cy be a circle in R¥, of radius 1, which is 
centered at (a,y,0) and lies in the plane through the origin perpendicular to the 
circle A. Then T is the union of all the circles Cy, (see Figure 11.5.2). Note that 
T is a smooth two-dimensional surface. 

Solution: We may parameterize T as follows: 


ax = (2+cos 2nt) cos 27s 


y = (2+ cos 2nt) sin 2rs, 
2 =sin2nt, 


with 0 < s < 1 and 0 < ¢ < 1. In other words, T is the trace of the 2-cell 


E: I? + R3 given by 
E(s,t) = ((2 + cos 27t) cos 27s, (2 + cos 2rt) sin 27s, sin 27t). 


Now the 2-form w is dé where ¢ is the -form @ = ydx +aydz. Thus, by Stokes’s 
Theorem 


(11.5.3) fo=[eo- Le 


However, OE is made up of four parameterized circles. Two of them are 


a(t) = ((2+cos2nt),0,sin2nt) and p(s) = (3cos2n's, 3sin 278, 0), 


356 11. Vecter Calculus 


Figure 11.5.2. The Torus of Example 11.5.16. 


and the other two are 73(t) = 71(1—¢) and 44(s) = y2(1—s) ~ that is, 7g and yy are 
just 71 and 7 traversed in the reverse direction. It follows that the contributions 
of the integrals over these paths cancel and, hence, that the integrals in (11.5.3) 
are all @ 


Classical Form of Stokes’s Theorem. If ¢ = f; dx + fz dy + f3 dz is a 1-form 
and F is the component vector field F = (f1, fz, fs), then by Remark 11.3.4, dé 
has curl F as its component vector field. Using this and (11.5.2) yields the classical 
form of Stokes’s Theorem. 


Theorem 11.5.17. Let E be a simple 2-cell in R? with trace § and let 6 = fy dx + 
fody + fgdz be a1-form defined on the trace of E. With N the normal vector for 
E as defined above, T the tangent vector to the path OE, and F the vector field 


F = (fi, fa, fa), we have 
| F-Tds= [ome -Ndo. 
@E Ss 


Proof. The integral on the left is just the path integral [,,,.¢ interpreted as in 
Theorem 11.4.15. By Stokes’s Theorem, Remark 11.3.4, and (11.5.2) this is equal 


to 
| d= [ curl FN do. Q 
E Ss 


11.5. Surface Integrals and Stekes’s Theerem 357 


a 


10. 


ane 


12: 


. Let H be a parameterized surface in R® with trace $ and let 


Exercise Set 11.5 


. For the part of the surface « + y+ 2 = 1 that lies in the first octant, find a 


smooth parameterization H for which the normal vector points up, and then 
compute the integral of the 2-form w = x? dy A dz over H. 


. For the surface z = 1 — 2? — y?, z > 0, find a smooth parameterization H, 


with normal vector pointing up, and then compute the integral of the 2-form 
w = ady Adz + ydz A dx + zdx A dy over this surface. 


. For the smoothly parameterized surface in R® defined by 


H (u,v) = (5u,cos2rv,sin2rv), (u,v) € I’, 


describe the trace of H and then compute the integral over H of the 2-form 
w = ydy Adz —adzAdx +2dxA dy. 


. Find the integral over the sphere 2? +y?+2? = 1 of the 2form dyAdz—2dzAdz. 


. If H is a parameterized 2-surface in R°, with (w,y, 2) = H(u,v), and if Ny = 


(91,9293) is its normal vector field, show that H*(dy Adz) = gidu A dv, 
H*(dz A dx) = go du dv, and H* (dx A dy) = g3 du dv. 


. Find the normal Ny and unit normal N for the parameterized torus of Example 


11.5.16. 


. Show that if H : U + R® is a smoothly parameterized 2-surface and if P : V + 


U is a smooth parameter change, then the normal vectors of H and H 0 P are 
related by Nyop = det(dP)Njy © P. 


(mn2.3) 
be the unit normal vector field on S. Show that the area of $ is f,,1, where 
n = mdy Adz + tpdz A dx + ngdx A dy. 


Hint: Use (11.5.2). 


. Use Stokes’s Theorem to compute the integral of the 2-form 


w = ydy dz + 2dzAdx + dx Ady 


over the hemisphere x? + y? + 2? = 1, z > 0, oriented so that the normal vector 
points up. Hint: w = dé for a certain 1-form ¢. 

If F(a,y, 2) = (wy, yz, vz) and S is that part of the plane x + y +2 = 1 which 
lies in the first octant, oriented such that the normal vector N’ points up, use 
the classical form of Stokes’s Theorem to compute f, curl F- N do. 


If ¢ = zdx +32 dy — ydz, use Stokes’s Theorem to compute the integral of the 
1-form ¢ over the ellipse which is the intersection of the cylinder «? + y? = 9 
with the plane z = x. Hint: The ellipse is the boundary of the surface consisting 


of that part of the plane z= a which lies inside the cylinder. 


Show why the integral of dé over the sphere 
S={(2,y,2z):2+y+2=1 


is 0 for every smooth I-form ¢ on S. 


358 11. Vecter Calculus 


11.6. Gauss’s Theorem 


In this section, we generalize Green’s Theorem to the case of a 3-cell in R®. The 
result is Gauss’s Theorem. It relates the integral of a 3-form ¢ over a 3-cell with 
the integral of d? over the boundary of the 3-cell. We begin with a brief discussion 
of integrals of 3-forms in R°. 


The Integral of a 3-form. A 3-form in R® has the form ¢ = f dx Ady A dz for 
some continuous function f. As with 2-forms in R?, we define the integral of such 
a 3-form in R® over a Jordan region U to be 


(11.6.1) fe= [ sav. 


Just as it did with integrals of 2-forms, the change of variables theorem leads 
to a change of variables theorem for integration of 3-forms. The proof is the same 
as the proof of the two-dimensional version in Theorem 11.4.4. 


Theorem 11.6.1. Let H be a smooth transformation from the open Jordan region 
U in R® to another Jordan region in R? and suppose H is one-to-one with non- 
singular differential on U. If ¢ is a bounded 3-form on H(U) and H*(@) is bounded 


on U, then 
| o= |] H*(4) 
H(U) U 


provided det(dH) > 0 everywhere on U. If det(dH) <0 on U, equality holds if the 
right side of the equation is replaced by its negative. 


A transformation H which satisfies the above conditions will be called a smooth 
parameter change. 


Example 11.6.2. Find the integral of the 3-form 2(x? + y?) dx A dy Adz over the 
truncated cone C = {(x,y,2): 2? +y? < 22, 1<z< 2}. 


Solution: We could do this problem as an ordinary triple integral in rectan- 
gular coordinates. However, we choose to parameterize C’ using something like 
cylindrical coordinates (conical coordinates, actually). We let R be the rectangle 
defined by 0< 1 < 1,0 <0 < 2m, and 1 <2 <2and define H : RC by 


H(r,0, 2) = (rz-cos6,rz sin 8, 2). 
That is, we make the change of variables 
w=rzcos0, y=rzsind, 2=2. 
Then 
da = 20080 dr —rzsin 0d0 +7 cos 0 dz 
dy = zsin0 dr + rz cos 0d0 +r sin 0 dz 


dz = dz, 


so that dx A dy Adz = rz? dr Ad0 A dz, while 2(x? + y?) = r2z. Thus, 
H*($) = r32° dr A dO Adz 


11.6. Gauss’s Theorem 359 


and 


2 Qa 1 
15 
[o-[ro=| [ [ 7923 drdOde = —". 
JH JR Ai Jo Jo 8 


The Boundary of a Cube. Our next task is to prove Gauss’s Theorem on the 
standard cube J* in R°. In order to formulate the theorem, we need to fix an 
orientation on the boundary of the cube. 


The boundary of a cube is not a smooth surface. It consists of six squares, 
which are smooth surfaces, joined together along their sides. We choose to orient 
each of these in such a way that a corresponding normal vector points away from 
the cube. That is, an ordered pair of vectors in one of the sides has the correct 
orientation if the cross product of these vectors points to the exterior of the cube. 


One way to parameterize the six faces is as follows: we let (s,¢) be the coordi- 


nates of a point on the standard square I?, Then 


F)(s,t) =(@,s,t) and F"(s,t) = (1,s,t) 


parameterize the two faces perpendicular to the z-axis, while 
F'*(s,t) =(s,@,t) and F?4(s,t) =(s,1,¢), 
F°%(s,t) =(s,t,@) and F*(s,t) = (s,t,1) 


parameterize the faces perpendicular to the y- and 2-axes, respectively. Unfortu- 
nately, three of these have the wrong orientation. For example, F! and F!! each 
send the standard basis in R? to a pair of vectors in R with cross product pointing 
in the positive z-direction. Hence, they don’t both point to the exterior of the 
cube. In fact, for F!° this cross product vector points to the interior of the cube. 
In general, the orientation of F’? is correct if i+ is even and it is incorrect if i+ 
is odd. Thus, an integral over a face with i+ o odd will have the wrong sign. We 
can fix this by multiplying the integral by —1. This idea leads to an interpretation 
of the boundary of the cube J° as a formal sum 


(11.6.2) ar =S(-1)"*" Fr? 


where i runs from 1 to 3 and o runs from @ to 1. We then define the integral of a 
2-form ¢ over OI® to be 


(11.63) f,e-cen fe 


io 


We would get the same result if we just reversed the orientation of each face F!? with 
i +o odd and then took the sum of the integrals over the resulting parameterized 
surfaces. However, there is an advantage to writing the integral as in (11.6.3), 
which will become apparent in the next section. 


With these conventions established, we may state and prove Gauss’s Theorem 
for the standard cube in R°. 


Gauss’s Theorem on a Cube. The proof of Gauss’s Theorem on a cube is not 
materially different from the proof of Green’s Theorem on a square. 


360 11. Vector Calculus 


Theorem 11.6.3. Suppose ¢ is a smooth 2-form defined on I?. Then 


L,e= oe 
Jars e 


Proof. We first show that the theorem holds for ¢ of the form ¢ = f dyAdz. With 
OI represented as in (11.6.2), we have 


[.e= Do fl o= Lev pf. tavnae 


The integral on the right in this equation will vanish if either y or z is constant on 
the face Fiz. Thus, only the integrals of f dy A dz over the faces Fy and Fi, may 
be non-zero. This implies 


1 1 
by ae [ [ 10.5000 
Jars 
-{[f# (a, s;t)dedsdt = [2 
Bx! Is 


by the Fundamental Theorem of Calculus applied to the integral in the «-direction. 

If ¢ has the form g dy Adz or h dx Adz, the proof is the same with the variables 
and the value of i interchanged. Since every smooth 2-form is a sum of forms for 
which the theorem is true and since the integrals involved are linear functions of 
the forms in the integrand, the theorem is true in general. a 


Gauss’s Theorem for a 3-cell. 


Definition 11.6.4. A 3-cell in R® is a smooth function E ; J? > R®, A 3-cell 
is simple if it is one-to-one with non-singular differential on the interior of J°. A 
simple cell E is positively oriented if det(dE) > 0 on the interior of E. 


As in the definition of 2-cell, the meaning of smooth requires some comment, 
since [? is not an open set. Along each face or edge of I? some of the partial deriva- 
tives of the coordinate functions of E must be interpreted as one-sided derivatives, 
while at interior points of J? these are the usual 2-sided derivatives. The resulting 
functions on J? are then required to be continuous. 

The faces of B are the functions B'7 = Eo F'?, where F'? is the io face 
of I? as defined at the beginning of this section. Thus, E'°(s,t) = E(@,s,t), 
E"(s,t) = E(1,s,t), £°(s,t) = E(s,0,), etc. It follows from the above definition 
that each face is a 2-cell. 

The boundary of a 3-cell E is defined to be 


dE =0(-1)*°E, 


io 


where, as in (11.6.3), this means that the integral of a 2-form @ over JE is defined 


o be 
[ere fo d. 


The following is Gauss’s Theorem for a 3-cell. 


11.6. Gauss’s Theorem 361 


Theorem 11.6.5. If E is a smooth 3-cell in R° and if ¢ is a smooth 2-form on 


the trace of E,, then 
. $= [ dd. 
ee JE 


Proof. This is just like the proof of Green’s Theorem for a 2-cell. We have 


Le re E*() = fave 


= [ Bas) = [ve 


p E 
by Theorem 11.6.3 and Theorem 11.3.10(c). a 


Example 11.6.6. Find the integral of the 2-form 
= (x? + y) dy Adz + (2xz —y) dx Adz + (wy? + z) dx Ady 
over the boundary of the solid A defined by the inequalities 0 << 2 < 1—a?—y?. 
Solution: The set A is the trace of a 3-cell with boundary equal to the bound- 
ary of A (it doesn’t matter what the 3-cell is, just that one exists). We use Gauss’s 
Theorem, which tells us that the integral we seek is equal to [,d@. We will pa- 
rameterize A using cylindrical coordinates 
w=rcost, y=rsint, z=z with 0<2<1-1r?,0<r<1,0<t<2n. 
Since dé = (2x + 2) dx A dy A dz = 2r(rcost + 1)dr A dt A dz, we have 


2 


1 plan? 20 1 pl-r' 
[ dg = [ £ [ 2(r? cost + r)dt dz dr = [ [ 4nr dzdr =. 
JA Je e Je Je Je 


Example 11.6.7. For 0 < b < 1, let B be that part of the solid sphere of radius 
1, centered at the origin, that lies between the planes z = —b and z = b. Compute 
the volume of B in two ways — first as an integral over B and second as a surface 
integral over OB. 


Solution: The volume we seek is fj, dx A dy Adz. We parameterize B using 
cylindrical coordinates. Then a = rcos6, y = rsin@, and z = z withO<r< 1, 
0<0< 2m, and —b < z <b. We know dx AdyAdz = rdrAdOAdz andr = V1 — = 
at points on the surface of the sphere. Thus, 


b pam pVT=e® 
[tenduaas= | [ if rdr dé dz 
# bJe e 
b 


-/ (1 — 22) dz = 2n(b—b°/3). 
Jab 


This is the result of the calculation of the volume integral. 


To compute the volume of B using a surface integral, we use Gauss’s Theorem. 
Since d(z dx A dy) = dz A dx A dx = dx A dy A dz, Gauss’s Theorem tells us that 


[enrana= f sdehdy = [ zr dr Ad0 
B eB eB 


where the latter integral results from switching to cylindrical coordinates. 


362 11. Vector Calculus 


Figure 11.6.1. Horizontal Slice of a Sphere. 


The surface JB is made up of three parts: a section S of the sphere defined by 
the conditions r = V1— 22, —b < z < b, and top and bottom horizontal dises D* 
and D~ defined by z = +b, @<7r< V1-B. 

The horizontal discs each have radius YT— 0? and so the contribution of the top 
disc D* to the integral [,,, 2r dr A dé is b(1 — b2)x. The bottom dise D~ appears, 
at first glance, to yield the negative of this since everything appears to be the same 
except that z = b on D+ and z = —b on D~. However, this is not correct. As 
part of OB, the bottom disc D~ has negative orientation relative to the standard 
(x,y)-coordinates in the plane while D* has positive orientation. The negative 
orientation of D~ reverses the direction of integration with respect to @ and, hence, 
reverses the sign of the integral. This leads to a result which is identical to that 
computed for D+. Thus, the combined contribution of D~ and D* to [yy 2rdrAdé 
is 2nb(1 — 2). 

To compute the contribution of the spherical section $8, we use the 2 and 0 
coordinates to parameterize S. Then r = VT= 2? on S and so 


Qn b 
| rdr nda = [ ‘i 2V1— 22 dz dé = 4b 1/3. 
es e —b 


Adding the various contributions gives us 


= 2n(b — b°/3). 


[ dx \ dy \ dz = | ar dr Ad =4/3b'x + 2b(1 — b?) 
JB eB 
Fortunately, this is the same answer as before. 


Classical Form of Gauss’s Theorem. If ¢ = f; dyAdz+ fodzAdx+ f3dxA dy 
is a 2-form in R® and if we let F = (f1, fz, f3) be its component vector field, then 


do = div F dx \ dy A dz, 


11.6. Gauss’s Theorem 363 


where div F = Of,/Ox + Ofo/Ay + Afg/Iz. If we combine this with (11.5.2) and 
Theorem 11.6.5, the result is the classical form of Gauss’s Theorem: 


Theorem 11.6.8. Let E be a 3-cell in R® with trace A. Suppose OE has trace 
equal to the topological boundary 0A of A and suppose F is a smooth vector function 
defined on the trace A of E. Then 


[ F-Ndo= [awrav. 
oA ‘A 


In a fluid flow problem, where F is the velocity field of the flow, this has the 
following interpretation. The left side represents the flur or rate of flow of fluid 
out of the region A, while the right side is the integral over A of a function div F 
which represents, at each point of A, the tendency of the fluid to move away from 
(diverge from) the point. 


The Integral over a 3-Surface in R¢. A smoothly parameterized 3-surface in 
R¢ is a smooth function H : U -+ R® such that U is an open subset of R® and dH 
is non-singular on U. The trace of H is its image in R%. 


Just as an ordered basis for a two-dimensional vector space determines an 
orientation for the vector space, an ordered basis for a vector space of dimension 3 
or higher also determines an orientation for the vector space. Two ordered bases 
determine the same orientation if and on 


. if the determinant of the matrix which 
transforms the first basis to the second is positive. 


As before, H determines an orientation on its trace § = H(I3). That is, dH (a) 
sends the standard basis in R? to an ordered basis for the linear subspace of R# 
whose translate by 6 = H(a) is the tangent space to § at b. A 3-surface in R® is 
a subset which, in a neighborhood of each of its points, may be given a smooth 
parameterization ~ that is, its intersection with this neighborhood is the trace of 
a smoothly parameterized 3 surface. A 3-surface is orientable if there is a smooth 
function which assigns an ordered basis, above, to each point of the surface. 


We define the integral of a 3-form over a smoothly parameterized 3-surface 
in R® in the same way that we defined the integral of a 2-form over a smoothly 
parameterized 2-surface. 


Definition 11.6.9. If U is an open Jordan region, H : U --+ R® is a smoothly 
parameterized 3-surface, and ¢ is a 3-form on H(U) such that H*(¢) is bounded 


on U, we set 
[o- fre. 
JH JU 


This defines the integral on the left. 


As before, this integral, though defined through the parameterization H is 
actually independent of parameterization in the sense that the integral is unchanged 
if H is replaced by J = Ho P, where P : V --+ U is any positively oriented 
smooth parameter change, provided V and J also satisfy the conditions of the above 
definition. The integral does depend on the orientation of H and if this is reversed, 
then the integral changes sign. Here, a smooth parameter change P : V --+ U 
between open Jordan regions in R? is a smooth one-to-one map with non-singular 
differential dP. It is positively oriented if det dP > 0. 


364 11. Vector Calculus 


Stokes’s Theorem for 3-cells in R*. The definition of a 3-cell in R® is the same 
as that of a 3-cell in R® except that the trace of the cell lies in R® rather than R?, 
Since, on the interior of I, a 3-cell is a smoothly parameterized surface, we may 
integrate a 3-form over it. With no extra work, we have Stokes’s Theorem for a 
3-cell in R®, for any d > 3. Its proof is the same as the proof of Gauss’s Theorem. 


Theorem 11.6.10. If E : I? + R¢® is a3-cell in R% and @ is a3-form defined on 


the trace of E, then 
u o= i dd. 
J@E E 


In the next section, we will state the general form of Stokes’s Theorem, which 
involves integrals over p-cells in R¥ for any q > p. 


_ was 
Exercise Set 11.6 


1. Suppose F is a positively oriented simple 3-cell in R® with trace A. Show that 
the volume of A is 


1 
V(A) =i le dy Ndz + yde Ade + 2dr Ady). 
E 


2. Let C be the solid defined by 
C={(a,y,2) €R?: a? +y? <2< 1}. 
Use Gauss’s Theorem to find the integral over the boundary of C' of the 2-form 
= («+ ysin® z) dy Adz + (y — cos zx) dz A dx + (32” + In(1 + ay)) dx A dy. 
3. Show how to construct a 3-cell with trace equal to 
A={(x,y,2) ER: 0? <a? +y? 42? < 07}. 


4. For a 3-cell F as in the previous exercise and a 2-form ¢ on A, show that 


fo= ff o- ¢ 
E Ch ‘Ca 


where Cy and Cy are the spheres of radius a and b, respectively, oriented so 
that the normal vectors point to the exterior of the sphere. If dg = @ what do 
you conclude? 


5. Show how to extend the result of the previous exercise to more general situations 
where one surface is the boundary of a solid A and the second surface is the 
boundary of a second solid B which is contained in the interior of A. 

6. Let F be a @! vector field on an open set U C R*. If a € U, use Gauss’s 
Theorem to prove that 

1 
div F(a) = lim —— [ F.Ndo. 
70 V(B;(a)) Jon, (a) 

7. Let U be an open set in R® such that U is the trace of a 3-cell E and let 

F = (fi, fa, fa) be a vector field on the trace U7. There is a 1-form ¢ with F 


11.7. Chains and Cycles 365 


as component vector field and a 2-form ¢* with F as component vector field. 
That is, 


b= fidx+ fody+ fgdz and ¢* = fi dyAdz+ fodz A dx + fz dx Ady. 


Show that 

(a) 6Ad* = P+ Fdr Ady Adz =||F|P de A dy A dz; 

(b) if @ = dg for some continuous function g on J which is C? on U, then 
dé* = Ag, where A = 0?/dx? + 02/dy? + 02/2? is the Laplacian; 

(c) if g is harmonic (i.e. if Ag =0 on U), then J, ||F|/?dV = [yn.96": 

(d) ifg is harmonic and g = @ on the trace of JE, then g is identically 0 on U. 


8. Let r : R°\{@} — R be the function r(z,y,z) = 227 +y? + 22. Using the 
notation of the previous exercise, compute dr fs show that d(1/r) = —dr/r? 


and (d(1/r))* = dr*/r?2.. Show that d(dr*/r?) = 0 and, hence, that 1/r is 
harmonic on R3\{0}. 

9. The gravitational force field due to a mass at the origin is a constant k times 
the component vector field of the 2-form dr*/r? of the previous exercise. Show 
that if $ is a solid sphere in R3, centered at the origin, then the flux across 0S 


[bop = nie. 
as 7 


Hint: For the surface 08, show that N is the component vector field of dr* 
restricted to 0S. Then use the classical expression for a surface integral (11.5.2). 


due to this field is 


10. Use Gauss’s Theorem to show that the integral in the previous exercise does 
not change if the sphere S is replaced by any reasonable solid A with @ in its 
interior. What reasonable assumptions on A will make this true? 


11.7. Chains and Cycles 


Much of what we have done with Green’s, Stokes’s, and Gauss’s Theorems in the 
previous section involves integration over cells. However, in some cases, we have 
worked with integrals over objects which are sums of cells in a certain sense. In 
particular, an integral over the boundary of a cell is not an integral over a cell, 
but a sum of integrals over the several cells which form the boundary. In the 
previous section we came to think of the boundary of a 3-cell as a formal linear 
combination (11.6.2) of 2-cells corresponding to the faces of 3. This suggests that, 
for any natural number k, we think of the boundary of a k-cell as a formal linear 
combination of the cells which consist of restrictions of the cell to the various faces 
of Z*. This will require a theory of integration, not just over cells, but over formal 
linear combinations of cells. Expanding on this idea leads to some very powerful 
and far-reaching concepts in mathematics. In this section, we will give a brief 
introduction to this formalism and then use it to restate Green’s, Stokes’s, and 
Gauss’s Theorem in their modern form. 

We begin with an introduction to this idea in the context of paths. Here the 
objects we wish to introduce are 1-chains and 1-cycles. 


366 11. Vector Calculus 


1-Chains. A path + in Ré is piecewise smooth, which means that it may be 
thought of as several smooth paths 71,...,7 joined together end-to-end to form a 
single path. The integral of a function over 7 is then the sum of the integrals over 
the paths 7. We may reparameterize each of these paths so as to have parameter 
interval I = [0, 1] without affecting the integral (Exercise 11.1.4). The formal sum 
of the paths +, is then a 1-chain in the sense of the following definition. 


Definition 11.7.1. A 1-chain in R¢ is a formal finite linear combination, with 
integral coefficients, 


» 
(11.7.1) P=) myy, 
j=l 


of smooth paths in R*. 


Note that (11.7.1) is not a linear combination of the 7j as functions on [0, 1] 
that is, the multiplication by integers and the sums are not pointwise operations of 
R¢-valued functions. It is purely a formal expression and cannot be simplified or 
manipulated until we impose some rules for manipulating such expressions. We do 
this below. 


We agree that if the individual terms mj7; in a chain are rearranged, so that 
they appear in a different order, then the chain does not change. We agree that the 
chain does not change if we drop summands m,7j with m, =0, and we agree that 
two summands m 7j and mx with 75 = ye may be combined to yield (mj-+mx)9y- 
The empty chain ~ that is, the chain with no summands ~ is denoted by 0. We 
add two chains in the obvious way: the sum of two formal linear combinations of 


paths is another formal linear combination of paths. The operation of addition, so 
defined, is clearly associative and commutative. 


The set of I-chains, as defined above, forms a commutative group ~ that is, it 
has an operation (+) which is associative and commutative; there is a zero element 
(the linear combination with no summands); and each element has an additive 
inverse (just replace each coefficient m; by —my). 


Definition 11.7.2. The expression (11.7.1) for a chain T is said to be in reduced 
form if the 7; are distinct paths and all the m; are non-zero. Note that each chain 
may be expressed in reduced form. We define the trace of a chain T to be 


> 
ry) =U, 
j=l 


where (11.7.1) is an expression of the chain in reduced form. 


1-Cycles. A 0-chain in R¢ is a formal linear combination, with integral coeffi- 
cients, 


rd 
eS Smiley}, 
j=l 


of singleton subsets {x} with each «, in R¢. 
Here, the sum is not a sum of vectors in R¢. It is a purely formal sum and can 
only be manipulated using the rules we set down: again, terms may be rearranged 


11.7. Chains and Cycles 367 


in the sum without changing the 0-chain. Terms with 0 as coefficient are dropped, 
and terms with the same {aj} may be combined by adding their coefficients. The 
empty chain is denoted by 0. Addition is defined as before and the result is another 
commutative group. We must be careful here: the addition operation, defined this 
way, has nothing to do with the operation of addition in the vector space R¢. The 
following example illustrates this fact. 


Example 11.7.3. For the 0-chains Cy and C2 in R! defined by Cy; = 1{2} + 
3{3.5} — 2{@} and Cy = 1{4.9} + 4{0} — 3{3.5}, find Cy + C2 and simplify it as 
much as possible. 
Solution: We have 
Cy + Co = 1{2} + 2{3.5} — 2{0} + 1{4.9} + 4{0} — 2{3.5} 

= 1{2} + (2{3.5} — 2{3.5}) + (—2{@} + 4{@}) + 1{4.9} 

= 1{2} + (2 — 2){3.5} + (—2 + 4){0} + {4.9} 

= {2} + @{3.5} + 2{0} + 1{4.9} = 1{2} + 2{@} + 1{4.9}. 
Note that this does not further simplify to {2+2-0+4.9} = {6.9}. In the group of 
O-chains in R! it is not true that 2{@} = 0 or that 1{2} +1{4.9} = 1{6.9}. We will, 
however, commonly drop the coefficient 1 in front of a path or a singleton point. 
Then the result of the above computation becomes {2} + 2{@} + {4.9}. 


Note that if I-chains are replaced by 0-chains in Definition 11.7.2, we have a 
notion of reduced form for 0-chains. Each 0-chain can be put in reduced form. Once 
it is expressed in reduced form, the trace of a 0-chain is just the union of the points 
of R@ that appear in this expression. Note that in the previous example, the last 
expression in the series of equalities is an expression for Cy + C2 in reduced form. 


Definition 11.7.4. We define a map, called the boundary map, 0 from 1-chains in 
R¢ to O-chains in R¢ by 


P P 


Bf omy | = erful} — may @))- 


j=l j=l 


A I-chain T in U is called a 1-cycle if OT = 0. 


The boundary map @ from 1-chains to 0-chains is a group homomorphism. This 
two L-chains P and A, A(T + A) =P +A. 

A smooth path with parameter interval [@, 1] is, itself, a I-chain (a I-chain 
where there is only one summand and its coefficient is 1). Also, as mentioned 
earlier, a path y which is not smooth can also be used to produce a I-chain P by 
breaking the path up into smooth pieces and reparameterizing the pieces so that 
they have [@, 1] as parameter interval. If this is done, then it turns out that 7 is a 
closed path if and only if 7 = 0 (Exercise 11.7.13). 


means that, for ar 


Example 11.7.5. Consider the rectangle R in R? with vertices the points (0,0), 
(2,0), (2,1), and (0, 1) (Figure 11.7.1). Represent its boundary as a cycle. 


368 11. Vector Calculus 


(0,1) % (2,1) 
< 
ieee 
YW ¥ 2 
> 
(0, 0) %, (2, 0) 


Figure 11.7.1. Boundary of a Rectangle as a Cycle. 


Solution: We set 1(t) = (2t,0), ya(t) = (2,4), a(t) = (2-2, 1), and a(t) = 
(0,1=2). Then = 1 + 72 +73 +74. Note that P'(I) = OR and 
@P = y1(1) — (0) + 721) — 2(0) + y3(1) — 73(0) + y3(1) — 78(0) 
= {(2,0)} — {(0,0)} + {(2, 1)} — {(2, 0)} + {(0,1)} — {2 D} 
+ {(0,0)} — {(0, 1)} 
=0 


and so T’ is a cycle. 

Note that we could also represent the boundary of R asa single path which joins 
together the smooth paths 71, 72,73, and 74. As we shall see below, for the purposes 
of integration, the two ways of representing the boundary of R are equivalent. 


The boundary of a reasonably nice bounded subset of the plane may be repre- 
sented as the union of a number of smooth curves. The rectangle in Figure 11.7.1 
is one such set. When this is true, we would like to represent the boundary by a 
certain cycle. In Figure 11.7.1 this was the cycle of the previous example. The next 
example describes another such situation. 


Example 11.7.6. In Figure 11.7.2 , the region $ in the plane consists of points 
inside the large circle but outside the union of the two smaller circles. Represent 
@S by a cycle. 
Solution: Smooth curves which trace each of the three circles are: 
on(t) = (4cos(2rt), 4 sin(2t)), 
a(t) = (2 + cos(2xt), sin(2nt)), 
n(t) = (—2 + cos(27t), sin(2xt)). 


11.7. Chains and Cycles 369 


Figure 11.7.2. Boundary of S as the Cycle y — 72 — 73- 


Each circle is traced once in the counterclockwise direction by the corresponding 
curve. We represent the boundary 0S of S by the cycle T = 71 — 72 — 73- 


Why do we choose to multiply y; and yz by —1 in the sum defining ['? It 
is due to the following: while the circle y has positive orientation relative to S, 
the circles 7 and 73 have negative orientation relative to S and multiplying by 
—1 compensates for this. For the meaning of this statement see the discussion on 
orientation of paths in Section 11.4. 


p-chains and p-cycles. For any non-negative integer p, we will define a p-chain 
in R¢ to be a formal linear combination of p-cells in R¢. First we need to define 
what we mean by a p-cell in R¢. We have defined 2-cells and 3-cells in previous 
sections. A 1-cell in R¢ will be a smooth path in R¢ parameterized on J = [0,1]. A 
0-cell is just a singleton set {2} in R?. 


Definition 11.7.7. We define p-cells just as we defined 2-cells and 3-cells. A p-cell 
in R¢ is a smooth function E : J? + R¢, A p-cell is simple if it is one-to-one with 
non-singular differential on the interior of J?. 


As before, in defining what it means for such a function to be smooth on the 
compact set J?, on the boundary some partial derivatives must be interpreted as 
one-sided derivatives. 


370 11. Vector Calculus 


Definition 11.7.8. A p-chain C in R¢ is a formal linear combination 
(11.7.2) C=>o mE; 
J=1 


of p-cells in R¢ with integer coefficients. 


As we did with I-chains, we agree that if the individual terms m;B; in a chain 
are rearranged, so that they appear in a different order, then the chain does not 
change. We agree that the chain does not change if we drop summands m;E; with 
mj; = 0, and we agree that two summands m,E; and m,E, with E; = Ex may 
be combined to yield (mj +mx)E;. The empty chain ~ that is, the chain with no 
summands ~ is denoted by 0. We add two chains in the obvious we 
formal linear combinations of p-cells is another formal linear combination of p-cells. 
The operation of addition, so defined, is clearly associative and commutative. 


: the sum of two 


‘As before, the set of p-chains in R¢, as defined above, forms a commutative 
group. 

The expression (11.7.2) for a chain Cis said to be in reduced form if the Ej are 
distinct paths and all the mj are non-zero. Note that each chain may be expressed 
in reduced form. We define the trace of a chain C’ to be the union of the traces of 
the E; in an expression of the chain in reduced form. 


Boundaries. If E : /? + R¢ is a continuous function, then for j = 1,...,p we 
consider the 2p functions of p— 1 variables defined by 


EP = E(aj,...,05-1,0,25,..-,0p) and 
EB) = E(x, 


j—1y 1,2 j,.-.,@p)- 


> 


Each of these is a continuous function from J?~! to R¢. We will call these the 
(p — 1)-dimensional faces of E. 


Definition 11.7.9. If E is a p-cell, then its boundary, 9B, is the (p— 1)-chain 
defined by 
dE =) -(-1)*" EB 
io 
where i ranges over 1,...,p and o ranges over 0, 1. 


If C =D), B; is a p-chain, then we define its boundary C to be the (p — 1)- 
chain J), OF;. We say that. C is a p-eyele if OC = 0. 


Recall that the above definition of OE is the way we defined the boundary of a 
3-cell in the previous section. It is not quite the same but is equivalent to the way 
we defined the boundary of a 2-cell in Section 11.4. 


Theorem 11.7.10. If C is a p-chain with p > 2, then 0?C =0. 


Proof. It is enough to prove this in the case where C is a single cell E. Then 


OE = (OE) = YY yitet(wiey, 


io jr 


11.7. Chains and Cycles 371 


Note that if i < j, then (B'*)!7 = (Bi+12)i7. Since these two terms appear with 
opposite signs in the above sum, they cancel each other out. But every term in the 
above sum is of one of these two types. Hence, the sum is 0. a 


The previous theorem tells us that the boundary of a chain is always a cycle. 
In particular, the boundary of a p-cell is a (p — 1)-cycle. 


Example 11.7.11. Express the solid rectangle of Example 11.7.5 and Figure 11.7.1 
as the trace of a 2-cell E and calculate OE. 


Solution: We set E(s,t) = (2s,t) for (s,t) € I?. This has the rectangle of 
Figure 11.7.1 as trace. By Definition 11.7.9, 


dE = BP. pll_ pe pe 


where, in terms of the paths 7, of Example 11.7.5, 


= E(s,1 
( 


= (2s, 1) = 73(1—5), 

6,5) = (0,8) = ya(1— 5). 

Note that E?! and E!*(s) are 73 and 74 with orientation reversed. This is why 
they each occur with a factor of (—1) in the cycle OE. This compensates for 
the orientation reversal when we do integration over OE and ensures that, for the 
purposes of integration, the cycle OE and the cycle T = 71 +72 +73 +4 are 
equivalent. 


Integration over Chains and Cycles. Chains exist so that we may integrate 
over them. We define the integral of a p-form over a p-chain below. First. we define 
the integral of a p-form over a p-cell. This is no different than the definitions for 
integrals of forms in dimensions 1, 2, or 3. We use the transformation law for how 
a p-form ¢ in R® transforms to a p-form E*(¢) on J? under a cell E : J? + R#. 
This is defined exactly as in Definition 11.3.9. 


Definition 11.7.12. If E: [? + Ris a p-cell and ¢ is a p-form defined on a set 
containing E(J?), then we define the integral of ¢ over E by 


[e= [ F. 


We define the integral of a p-form over a p-chain as follows: 


Definition 11.7.13. Let 
> 
C=)>o mE; 
j=l 


be a p-chain in R¢, expressed in reduced form. If ¢ is a p-form defined on the trace 
of C, then we set 


P 
(11.7.3) [e- m; [ od. 
c » By 


372 11. Vector Calculus 


It is a consequence of this definition that if Cy and Cy are two p-chains and ¢ 
is a p-form defined and continuous on a set containing both the trace of Cy and the 
trace of C2, then 


(11.7.4) [ o= | otf d. 
Ci+C2 CQ C2 


The proof of this fact is left to the exercises (Exercise 11.7.12). 


Definition 11.7.14. Suppose C; and C2 are p-chains in R¢. We will say that Cy 
and Cy are equivalent if they have the same trace and 


Le-f 9° 
fon C2 


for every p-form @ on the trace of C}. 


In general, a p-cell E is equivalent to any p-cell F for which there is a positively 
oriented smooth parameter change P such that F = Eo P. If P is negatively 
oriented, then the chain (-1)E is equivalent to F. In the case of a L-cell + (a 
smooth path) this is illustrated by the fact that (—1)y is equivalent to ~7, the path 
y traversed in the reverse direction (see Exercise 11.1.8). 


Example 11.7.15. Show that if 7 and 7g are two paths with parameter interval 
I = [0,1] and if +(1) = 72(@) (so that ~ starts where 7 ends), then the chain 
T = 1 +72 is equivalent to the chain consisting of the single path 1 which is 71 
and 7 spliced together, that is 
(t) = 71 (2t) if @<t<1/2, 
e no(Qt—1) if 1/2<t<1. 
Solution: Note that y(1) = yi(J) Uy2(1) = T(J). On [0,1/2], 7 is obtained 
from 7 by a smooth parameter change t -+ 2t, while on (1/2, 1], 7 is obtained from 
42 by the smooth parameter change ¢ --+ 2t— 1. Thus, for any 1-form on the trace 


1 1/2 1 
[e= [ coors f artonnats f oy(O)7"(H at 


eo [2 
= otf o= fo 
n 12 T 


and, hence, + and Pare equivalent chains. 


The General Stokes Theorem. 


Theorem 11.7.16. If @ is a smooth (p—1)-form defined on I?, then 


[ o= f dd. 
Jair IP 


We won’t go through the proof here. It is very much like the proof of the p = 3 
version of the theorem, which was proved earlier (Theorem 11.6.3). It is a simple 
application of the Fundamental Theorem of Calculus. 


This leads us to the general version of Stokes’s Theorem. 


11.7. Chains and Cycles 373 


Theorem 11.7.17. Let C be a p-chain in R4 and let ¢ be a smooth (p—1)-form 
defined on the trace of C. Then 


(11.7.5) [= [w. 


Proof. If C is a single cell FE, then this follows, as with earlier versions, from the 
previous theorem and the identities 


if o- f E*(¢) and [w=] dE"(6). 
ere arp ED TP 


The proof for general chains now follows from the fact that both sides of (11.7.5) 
are linear in C. That is, if C is a certain linear combination of cells Ej, then the 
integrals in the formula are the corresponding linear combinations of the integrals 
with C replaced by Ej. a 


IfC is a single cell, then the above theorem is Green’s Theorem when p = 2 and 
4 = 2, it is the dimension 2 Stokes’s Theorem when p = 2 and q > 2, it is Gauss’s 
Theorem when p = 3 and = 3, and it is the dimension 3 Stokes’s Theorem when 
p =3and q > 3. In the case where p = 1 and q = 1 it is the Fundamental Theorem 
of Calculus, and when p = 1 and q > 1 it is the Fundamental Theorem of Calculus 
for path integrals. 

The following are simple corollaries of the general Stokes Theorem. The proofs 
are left to the exercises. 


Corollary 11.7.18. If C is a p-cycle and @ is a smooth (p—1)-form on the trace 


of C, then 


Recall that a closed p-form ¢ is a p-form such that dd = 0. 


dé = 0. 


Corollary 11.7.19. If C is a p-chain and @ is a smooth closed p-form on the trace 


of C, then 
[ $20: 
Jac 


—— 
Exercise Set 11.7 


1. If £1, Eo, and E3 are three distinct p-cells in R¢, express the sum of the chains 
QE, + Ey —3E3 and —5E, — Ey +38; in reduced form. 

2. Express the sum of the 0-chains C; = 2{—3} —4{1} + {2} and Cz = 3{1} — {2} 
in reduced form. 

3. For t € [0,1] let. y(t) = (2¢ — 
of the following 1-chains in R? 
(a) n+72+73- 
(b) 1 +72 -7- 
(c) 11 + 272 — 393. 


» y(t) = (L— t,t), a(t) = (t- 1,4). Which 
is a cycle? 


374 11. Vector Calculus 


4. Let E(r,0) = (ros 2x0, rsin 2n0) for (r,0) € I?. Show that E is a simple cell 
and explicitly describe the cycle OF. 

. Let A be the triangle in R? with vertices at (0,0), (1,0), and (0,2). Express 
this triangle as the trace of a 2-cell E and find the cycle OE. 


a 


6. Find the integral of the 1-form 2ry? dx + 322y? dy over the L-cycle of the pre- 
vious exercise. 

7. For t € [0,1], let y(t) = (2t — 1,0) and y(t) = (cos(nt), sin()) and define a 
I-chain P in R? by P= 1 +7. For the I-form o(a, y) = 3x de + 2y dy, find 
Jee 

8. Find fj. if P is the I-chain of the previous exercise and y = x dy. 

9. Define 2-cells in R? as follows. For (s,t) € I?, 

E(s,t) = ((1 +s) cost, (1 +5) sin at), 
F(s,t) = ((1 +s) cos mt, —(1 +5) sin rt). 
If C is the 2-chain E+ F, then find 8C and f,,,(e" dx + sin(y?)) dy. 

10. In R? define a smooth path 7,(t) = (rcos(2nt), rsin(2nt)) for each r > 0. If @ 
is a 1-form on R? \ {0} such that dé = 0, then show that J @ is independent 
of r. Hint: For 0 < s <r, consider the cycle P = >, —s- Is this OF for some 
2-cell E? 

11. Let ¢ be a smooth 2-form on R® \ {0} which satisfies dé = 0. Show that [5p 
is the same number for all simple 3-cells E such that 0 is in the image of the 
interior of [3 under E but not in the image of 0/? under E. On the other hand, 
if 0 is not in the trace of E at all, then this integral is 0. Hint: For the first 
part, show that for any such E, the integral over the boundary of F is the same 
as the integral over any sufficiently small hollow sphere centered at 0. 

12. Prove (11.7.4). 

13. Suppose 4 is a path and [ = 7, +-+++n is the I-chain made by breaking 7) up 
into smooth paths and reparameterizing each of them so that it has parameter 
interval (0, 1]. Show that T’ is a cycle if and only if 7 is a closed path. 


14. Prove that if T is a I-cycle, then I’ is equivalent to a l-cycle with all of its 
summands closed paths. Hint: Use repeated application of the following idea: 
if one path begins where another one ends, then the two can be joined together 
to form a single path which is equivalent to the sum of the two individual paths. 


15. Let ¢ be a smooth 2-form on R3\ Z, where Z is the set of integers on the x-axis. 
Let 7 be a positively oriented parameterization of the sphere of radius n + 1/2 
centered at the origin, and for j = —n,...,n let 7; be a positively oriented 
parameterization of the sphere of radius 1/3 centered at j. Then let P be the 
cycle P =", 25. If. de = 0, show that 


for fo 


ay 
Appendix 


Degrees of Infinity 


In Exercise 2.6.9 we used the fact that there is a sequence of rational numbers in 
which each rational number appears exactly once. In other words, the set of natural 
numbers is the same size as the set of rational numbers. On the other hand, in 
Section 1.4 we asserted that the set of irrational numbers is much larger than the 
set of rational numbers. Do these last two statements even make sense? In this 
appendix we will show that, properly interpreted, they do make sense and they 
are true. This involves the study of the relative size of sets — that is, the study of 
cardinality. 


A.1. Cardinality of Sets 


For finite sets the notion of cardinality is easy. A set 9 is finite if, for some n € N, 
the elements of S can be put into a one-to-one correspondence with the set Nn = 
{k€N:1<k <n}. This means that we can count the elements of S using the 
integers from 1 to n. In this case we say that S has cardinal n. 


It is common in mathematics and particularly in set theory to use the follow- 
ing somewhat more economical terminology to replace the terms “one-to-one” and 
“onto”: 


Definition A.1.1. A function f : A > B is said to be injective if it is one-to-one. 
It is said to be surjective if it is onto (maps A onto B). It is j 
if it is both. A function which is injective is said to be an injection; one which is 


surjective is called a surjection; one which is bijective is called a bijection. 


Thus, a set S is finite with cardinal n if there is a bijection from N, to S. An 
infinite set is a set which is not finite. 


Theorem A.1.2. If A is a finite set with cardinal n and B is a finite set with 
cardinal m, then there is an injection h: A—+ B if and only ifn <m. 


376 A. Degrees of Infinity 


Proof. By definition, there are bijections f : Ny > A and g: Nm 3 B. If 
n<m, then N, C Nm. In this case, the inverse function f~! : A + Ny, followed 
by the inclusion of N,, into N,,, followed by g : N > B, results in an injection 
gof i: A+B. 

On the other hand, if there is an injection h: A > B, then p= g7!oho f is 
an injection from N,, to N,,. We will show by induction on n that the existence of 
such an injection p : N, + Nw implies that n < m. 

This is obviously true if n = 1, since 1 < m for every m € N. Suppose that it is 
true for n—1 (with n—1> 1). That is, assume that whenever there is an injection 
p:Nnu—1 3 Ne, for some k € N, then n—1 < k. Let p: Ny + Nm be an injection for 
some m €N. If p(n) =m, we set ¢ =p. If p(n) # m, we interchange p(n) and m. 
The function p composed with this interchange results in an injection ¢: Nn > Nm 
with q(n) =m. Since q is an injection, it must map the set of natural numbers less 
than n into the set of natural numbers less than m. That is, q restricted to N,—1 
is an injection from N,_1 into Njm_1. By the induction hypothesis, n —1<m—1. 
We conclude that _n <_m. This concludes the induction. a 


Note that if A is a finite set, if B is a set, and if there is a bijection from A 
to B, then B is also finite and A and B have the same cardinal. This leads to the 
following corollary of the previous theorem. 


Corollary A.1.3. If A and B are finite sets and f : A— B is an injection which 
is not a bijection, then there is no bijection from A to B and, hence, the cardinal 
of A is less than the cardinal of B. 


Proof. Let A have cardinal n and let g : Ny + A be a bijection. Since f is not a 
bijection, there is an element b € B which is not in the image of f. We define an 
injection from kh: Nyy1 > B by setting h(p) = fo g-1(p) for p < n and setting 
h(n +1) = b. Since there is no a € A for which f(a) = b, there is no p <n for 
which h(p) = h(n + 1). Thus, h is, indeed, an injection. The previous theorem 
implies that n + 1 is less than or equal to the cardinal of B. Since the cardinal of 
Ais n, we have a contradiction. Hence, there is no such g. 


Since there is no bijection from A to B, the two sets do not have the same 


cardinal. Hence, the cardinal of A must be less than the cardinal of B. This 
completes the proof. a 


In particular, if A is a proper subset of a finite set B, then there is no bijection 
from A to B and A has smaller cardinal than B. 


The situation is much different for infinite sets. For example, there is a bijection 
n+ 2n between N and its proper subset consisting of the even natural numbers. 


Dominance and Similarity. The above discussion of finite sets suggests the 
following definitions for sets in general: 


Definition A.1.4. We will say that sets A and B are similar if there is a bijection 
from A to B. We denote this by A ~ B. We will say that A is dominated by B 
if there is an injection from A into B. We denote this by A = B. If A =< B but 
A B, then we will write A = B. 


A.1. Cardinality of Sets 377 


Theorem A.1.2 says that, for finite sets A and B, A < B if and only if the 
cardinal of A is less than or equal to the cardinal of B (that is, if B has at least 
as many elements as A). Similarly, A ~ B if and only if A and B have the same 
cardinality (same number of elements), and A < B if the cardinal of A is less than 
the cardinal of B (A has fewer elements than B). We will use the same terminology 
in the case of sets that may not be finite ~ that is, we will say that A has the same 
cardinal as B if A ~ B and we will say that A has cardinal less than that of B if 
AXB. 

Corollary A.1.3 sa 
sets and if f is not a bij 


’s that if there is an injection f ; A -+ B between two finite 
ction, then A < B. 

The relation A < B behaves like an order relation. It is easy to see that it is 
transitive, meaning that if A = B and B = C, then A < C. We leave the proof 
of this to the exercises. It is also true but not so easy to prove that if A = B and 
BA, then Aw B. This is the Schrdder-Bernstein Theorem. 


The Schréder-Bernstein Theorem. 


Theorem A.1.5. If A and B are sets such that A= B and B = A, then A~ B. 


Proof. If A < B and B < A, then there are injections f : A-+ B and g: B-+ A. 
We note that if C = g(B), then g determines a bijection from B to CC A. We will 
construct a bijection from A to C. Then the composition of this with g- : C -+ B 
will be a bijection from A to B, proving that A~ B. 


We set h = go f. Then h is an injection of A to itself and its image is contained 
in C. Thus, we have 


ADC H(A). 
By repeatedly applying h to this triple we obtain a nested sequence 
(A.1.1) ADCDHA)DA(C) DhOW(C) I~. 


Now, of course, if C = A, we are done. If C # A, then S; = A\ C is non-empty. 

We define a sequence of sets {S,} inductively by setting Sn41 = h(Sn) for each 

n> 1. It follows from (A.1.1) that this is a pairwise disjoint sequence of sets ~ that 

is, SpA Sm =O ifn Zé m. Furthermore, h : Sp -+ Sn41 is a bijection for each n. 
We set 5 =U, Sn and define a map q: A-+C by 


a(x) =h(x) if we€S and q(x)=2 if ce A\S. 


This is a bijection, since h is a bijection of S onto SAC and 
A\S=C\(SNC). 
This completes the proof. a 


Using the above theorem, one can easily prove that the relation “” is transi- 
tive. 


Corollary A.1.6. If A= B and B= C, then AXC. 


The proof is left to the exercises. 


378 A. Degrees of Infinity 


1. Prove that a subset of N which is bounded above is finite (remember, a set. is 
finite if and only if it is similar to N,, for some n € N). 


2. Prove that N is not finite. 
3. Prove that Z~N. 


4. Prove Corollary A.1.6 by proving the following stronger result: if A < B and 
BC, then A<C. 


. If S is a finite set of cardinal n, what is the cardinal of the set of all subsets 
of S. 


a 


A.2. Countable Sets 


A set S is said to be countable if it can be counted ~ that is, it is countable if 
there is a way of assigning, in order, a distinct natural number to each element, of 
S, beginning with 1 and continuing forever or until we run out of elements of S. 
More precisely, $' is countable if it is similar to one of the sets Ny or to N itself. If 
it is similar to N, then it is both countable and infinite. We say that it is countably 
infinite in this case. 


Theorem A.2.1. The Cartesian product N x N satisfies Nx N~N. Hence, it is 
countably infinite. 


Proof. There are many ways to prove this. Figure A.2.1 shows one way to count 
N x N — that is, to define a bijection from N x N to N. 


Another proof uses the fact that each natural number has a unique factorization 
product of primes. Focusing on the prime 2, this implies that each element of 
N has a unique factorization of the form 2?-1(2q — 1) where p,q € N. Note that, 
in this expression, 2g — 1 runs through all odd integers, while 2?-! runs through 
all non-negative powers of 2. Thus, f(p,q) = 2?-!(2q — 1) defines a bijection from 
Nx N to NV. Another proof appears in the exercises. a 


Countability of the Rational Numbers. 
Theorem A.2.2. The set Q is countably infinite — that is, Q~ N. 


Proof. There is an injection f : Q +N x N. This is defined as follows: we begin 
ve 


by expressing each non-zero rational number in lowest terms and with a po: 
denominator. For such an expression n/m we set f(n/m) = (2n,m) ifn > @ and 
f(n/m) = (-2n +1, m) ifn < @. Finally, we set f(0) = (1,1). This clearly defines 
an injection f:Q4NxN. 

On the other hand, there is an obvious injection g : N+ Q — we just define 
g(n) =n. By Theorem A.1.5, Q~N, and so Q is countably infinite. 


A more direct proof of this result appears in the exercises. a 


A.2. Ceuntable Sets 379 


Figure A.2.1. How to Count the Elements of N x N. 


Theorem A.2.3. Every subset S C N is countable. 


Proof. If $ is finite, then it is countable. If $ is not finite, we will define an 
increasing sequence {8n} consisting of the elements of S. 

By the well-ordering principle for N (Exercise 1.2.19) every non-empty subset 
of N has a smallest element. We define the increasing sequence {s,,} by induction. 
We let s; be the smallest element of S and specify that s,41 is to be the smallest 
clement of § that is greater than sy. There is one because, otherwise, S would be 
bounded, hence, finite (Exercise A.1.1). 


An induction argument shows that if s € S \ {s1,s2,-..,5n}, then s > s, >n. 
It follows that every element of S appears in the sequence {sn}. Thus, k > sq is a 
bijection from N to S. We conclude that S is either finite (similar to N,, for some 
n) or it is similar to N itself. Thus, S is countable. im} 


Countable Families of Sets. 


Theorem A.2.4. A set S is countable if and only if there is a surjection f :N > S. 


Proof. We leave the proof that S countable implies that there is such a surjection 
f to the exercises. 


For the reverse direction, we suppose there is a surjection f : N + S. This 
means that, for each s € S, the set f~!(s) is non-empty. We define g(s) to be the 
smallest element of f~1(s). Then g : S + N is an injection and, by the previous 
theorem, this means that $' is countable. a 


Theorem A.2.5. The union of a countable family of countable sets is countable. 


Proof. Note that the sets in the family need not be disjoint ~ they may overlap. 
A family of sets is countable if it has the form {S,} where k ranges over N (if the 
family is infinite) or N,. for some n (if the family is finite). For each k we define 
a function f, : N+ Sg to be the surjection that is guaranteed by the previous 


380 A. Degrees of Infinity 


theorem. Then (k,j) + fg(J) is a surjection of N x N onto U, Se. If we compose 
this with a bijection N + N x N, we obtain a surjection N+ U), Sx. Thus, the set 
U, Sx is countable. Oo 


1. Prove the first part of Theorem / 
then there is a surjection f : N+ S. 


4 ~ that is, prove that if S is countable, 


2. Prove that the set of all finite subsets of N is countable. 


3. Prove that the Cartesian product of any finite number of copies of N is countably 
infinite. 


4. Prove that the set of possible words that can be formed using twenty-six letters 
is countably infinite. 


A.3. Uncountable Sets 


‘As far as we know at this stage of the game, all infinite sets might be countably 
infinite. The next theorem shows that this is not the case. It shows that for every 
set A there is always a set B with a larger cardinal ~ that is, a set B with A ~ B. 


Theorem A.3.1. If A is a set and the set consisting of all subsets of A is denoted 
24, then A= 24, 


Proof. Obviously, A < 24, since the function which sends each point of A to the 
singleton subset of A consisting of that point is an injection of A into 24. Thus, to 
prove the theorem, we just need to show that A is not similar to 24. 

Suppose A is similar to 24 — that is, suppose there is a bijection a: A 24, 
Consider the subset B of A consisting of all a € A such that a ¢ a(a). Then 
B = a/b) for some b € A since a is a bijection. 

If b € B, then b ¢ a(b) by the definition of B. But B = a(b), so this is a 
contradiction. 


If b ¢ B, then b € a(b) again by the definition of B. But B = a(b) and so this 
is also a contradiction. 


Thus, since both possible locations for b lead to a contradiction, no such bijec- 
tion a exists. a 


The notation 24 for the set of all subsets of A comes from a common notation 
for the set of all functions f : A + B. This is often denoted BA. The set of all 
subsets of A may be thought of as the set of all functions f from A to a set, with 
two points such as {0,1}. Here a subset $ of A is identified with the function from 
A to {0,1} which is 1 on S and 0 on the complement of S. 


A.3. Uncountable Sets 381 


The Real Numbers. 


Theorem A.3.2. The set R of real numbers is similar to 2". 


Proof. We will prove that the subset [0, 1) of R is similar to 2%. It is then easy to 
prove that (0,1) ~ R. We leave this last step to the exercises. 


Each real number « in the interval (0,1) has a binary expansion as 


410203 +++ Qn +++ = 04/2 + 42/4 +43/8+-++-+an/2" +---, 


where each ax is either 0 or 1. We do not allow expansions which eventually have 
all digits equal to 1, that is, expansions such that, for some j € N, ag = 1 for all 
k > j since this would yield either the real number | or the same real number as the 
expansion in which the last digit that is 0 is replaced by 1 and all the succeeding 
digits are replaced by 0. For example, the expansions 


-1010111111--- and  .1011000000-- - 


both represent the number 11/16. We choose to use the second one and disallow 
the first one. Representing a number in (0,1) in this way defines an injection 
@: (0,1) -+ 2%. We conclude that (0,1) = 2". 

We can also define an injection from 2" to {0,1). We simply map a subset 
A CN to 2A by sending each n € A to 2n. We then send the set 2A to its 
corresponding binary expansion of a real number in 0,1). The resulting number 
will not terminate with all 1’s since all its odd-numbered digits will be 0. The result 
is an injection 2% --+ (0,1). Thus, 2‘ < (0,1). By Theorem A.1.5, [0,1) ~ 2%. 

Since [0, 1) ~ R (Exercise A.3.1), we conclude that R ~ 2%, Oo 


Corollary A.3.3. N~Q<R. 


The set R and all sets which are similar to R are said to have the cardinal of 
the continuum. 


The Irrational Numbers. This justifies our statement in Chapter 1 that there 
are many more irrational numbers than there are rational numbers. The set Q of 
rational numbers is countable, but the set R \ Q of irrational numbers cannot be 
countable, for, if it were countable, then R = QUR \ Q would also be countable by 
Theorem A.2.5. Actually it is not hard to show that R\ Q ~ R. The proof of this 
is left to the exercises. 


Ea 
Exercise set A.3 


1. Prove that (0,1) ~ R by constructing a bijection from (0, 1) to R. 

2. Prove that N < R\ Q by constructing an injection N -+R\ Q. 

3. Prove that R ~ R \ Q by using the previous exercise to construct a bijection 
R-+R\Q. 

4, Prove that if a set S contains a countably infinite set, then S' is similar to a 
proper subset of itself. 


382 A. Degrees of Infinity 


A.4. The Axiom of Choice 


A finite set is not similar to a proper subset of itself, but both N and R are similar 
to proper subsets of themselves. Is this true of all infinite sets? By Exercise A.3.4, 
this is true of any infinite set which has a countably infinite subset. Does every 
infinite set X contain a countably infinite subset (a copy of N)? It seems obvious 
that this is the case — just choose a sequence of elements using induction in the 
following way: choose an element «1 € X and then, assuming « 
been chosen, choose x41 to be any element of X \ {21,22,... 
always be such an element since the set X is infinite. 


2) 042m have 
rn}. There will 


Unfortunately, this argument is not a valid use of the theorem on inductive 
definitions (Theorem 1.2.3). This is because we do not have a function that specifies 
the next element an41, given the elements «1, :9,... 
says to pick one. This problem cannot be fixed without introducing an additional 


\aq. The above argument just 


axiom into set theory. This is the Axiom of Choice: 


AC. Given a collection of non-empty sets, there exists a function which assigns to 
each set in the collection an element of that set. 


Using the Axiom of Choice, we can complete the above discussion and prove 
the following theorem. 


Theorem A.4.1. If S is an infinite set, thenN = S. 


Proof. Consider the collection of all sets of the form S$ \ F where F is a finite 
subset of S. Each of these sets is non-empty because S is infinite. By the Axiom of 
Choice, there is a function which assigns to each set in this collection an element 
of this set. We can think of this as a function which assigns to each finite subset 
FCS anelement ¢(F) €S\F. 


We inductively define an injection n + am :N— S. We set 2; = ¢(@) and 


impose the recursion relation an41 = $({v1,22,--.,am}). The sequence defined 
this way is an injection of N into S because, for each n, an 41 is in the complement 
of {21,22,..-,2n}. Thus, NX S. o 


As pointed out in Exercise A.3.4, the above theorem has the following corollary. 


Corollary A.4.2. A set is infinite if and only if it is similar to a proper subset of 
itself. 


Russell’s Paradox. It is very easy to get into trouble using set theory. The 
following argument is known as Russell’s paradox. It was discovered in 1901 by 
Bernard Russell. 


Let A be the set of all sets that are not elements of themselves. If A € A, then 
A¢ A. On the other hand, if A ¢ A, then A € A. Thus, we have a very serious 
contradiction here. 


This illustrates that we cannot allow just any describable collection to be a set. 
Clearly we cannot allow the set of all sets which are not elements of themselves to 
be a set. For similar reasons, we cannot allow the set of all sets to be a set. But 
what describable collections can we allow to be sets? We need rules that tell us 


what constructions result in sets. 


A.4. The Axiom of Choice 383 


Axioms of Set Theory. The commonly accepted such rules are called the 
Zermelo-Fraenkel (ZF) axioms of set theory. In this system, the only objects are 
sets and there is one relation ~ membership. That is, a set A may or may not be 
a member (or element) of another set B (being a member of B is not the same 
thing as being a subset of B). If A is a member of B, then we write A € B. The 
ZF axioms specify that there is a set with no elements (the empty set) and they 
describe allowable operations on existing sets which produce new sets. One of these 
asserts that the set of all subsets of a set is a set. Another axiom restricts sets in 
a way that eliminates the possibility that a set can be an element of itself. The 
axioms create a rich enough set theory that we may construct N, Q, and R as sets. 
We won't attempt to describe and explain the ZF axioms here, beyond the above 
comments. To do so would take us far beyond the scope of this text. 


The ZF axioms do not include the Axiom of Choice. For a time, mathemati- 
cians worried that adding the Axiom of Choice to the ZF axioms would lead to 
paradoxes similar to Russell's paradox. However, Gédel showed that if the ZFC 
axioms (Zermelo-Fraenkel axioms plus the Axiom of Choice) lead to a paradox, 
then so do the ZF axioms alone. Later, Cohen proved that if the ZF axioms to- 
gether with the negation of the Axiom of Choice lead to a paradox, then so do 
the ZF axioms alone. In other words, adopting either the Axiom of Choice or its 
negation will not introduce any new paradoxes into set theory. Thus, today most 
mathematicians are willing to assume the Axiom of Choice and to work with the 
ZFC axioms as the foundation of set theory. This results in a much richer set 
theor 


Order and the Axiom of Choice. An order relation on a set S is a relation 
“<” on pairs of elements of S which satisfies, for 7, y,z € 5, 


rg ee 3 


(1) reflexivit 
(2) antisymmetry: 2 < y and y < x implies x = y; 
(3) transitivity: « < y and y < z implies x < z. 


A set with an order relation is called a partially ordered set. If it is also true that, 
for each pair x,y € S, either « < y ory < a, then S is said to be linearly ordered or 
totally ordered. If a totally ordered set also has the property that every non-empty 
subset has a minimal element, then it is said to be well ordered. 

In a partially ordered set S, a subset of $ on which the order is a linear order 
is called a chain in S. 

Perhaps the most useful form of the Axiom of Choice is Zorn’s Lemma. This is 
equivalent to the Axiom of Choice, but here we will just prove that it follows from 


the Axiom of Choice. The proof that Zorn’s Lemma implies the Axiom of Choice 
is left to the exercises. 


Theorem A.4.3 (Zorn’s Lemma). /f S is a partially ordered set such that every 
chain in S has an upper bound, then S has a maximal element. 


Proof. Suppose S$ is a set. Using the Axiom of Choice, we choose a function f 
which assigns to each non-empty subset A C $ an element f(A) € A. 


384 A. Degrees of Infinity 


We will say that a subset C of a chain A in S is closed in A if, whenever y € C, 
2éA,and «< y, then x € C. If A is a chain in S, then we define 


A={be S:a<bforalla€ A}. 


In other words, A is the set of all upper bounds for A that don’t belong to A. We 
define a set F consisting of all chains A in S' such that if C is a closed subset of A 
and CO A #9, then f(C) is the smallest element of CM A. The empty set is a 
member of F, so F is not empty. 


We claim that if A and B are both members of F, then A is a closed subset of 
B or B is a closed subset of A. 


If B is not a subset of A, then there is an element 6 € B which is not in A. Let 
C ={x € ANB: «x <b}. Then C is a closed subset of A and a closed subset of 
B. Since b € BNC, f(C) is in B and is less than or equal to b. It can’t be in A 
because it would then be in AN B and less than b. This would put it in C, which 
is impossible since CNC = 0. On the other hand, f(C) must be in A if ANC 4 0. 
We conclude that AN C= @. Thus, each element of A must be less than or equal 
to some element of C. Since C is closed in A, this implies that A C C and, hence, 
that A = C, which is a closed subset of B. The same argument shows that B is a 
closed subset of A if A is not a subset of B. This proves our claim. 


Let $1 =U. If x,y € Si, then x € A and y € B for members A, B of F. One 
of these sets contains the other, say B C A, and so x and y are both in a chain 
A€ J. This implies that either x < y or y < x. Hence, $} is a chain. Each AE F 
is a closed subset of $1, since if x € S; and y € A with x < y, then « is in some 
B €F and either B is a closed subset of A or A is a closed subset of B. In either 
case x € A. 

If C is a closed subset of S; and C'S; #0, then there is an element « in this 
set and it is also in some A which is a member of J. Then ANC # 0 and so f(C) 
is the smallest element of AN C. This makes it the smallest element of $,9 C. 
Thus, 5} itself is a member of F. In fact, it is clearly the largest member of F. 


If 5; £0, then $1 U f($1) would be a larger set which is also a member of F. 
Since this is impossible, we conclude that no element of S is larger than all elements 
of S,. By hypothesis, $; has an upper bound in S. Since it can’t be larger than 
every element of $}, it must be equal to some element of S$; such that no element 
of S is larger. Hence, S has a maximal element. This completes the proof. Qo 


Another statement that is equivalent to the Axiom of Choice is called the Well- 
ordering Theorem. We will prove that it follows from Zorn’s Lemma. (which is 
equivalent to the Axiom of Choice) and the Axiom of Choice. We leave the proof 
that well ordering implies the Axiom of Choice to the exercises. 


Theorem A.4.4 (Well-ordering Theorem). Each set can be given an order 
relation that is a well ordering. 


Proof. Given a set X, we define another set 5 as follows. The elements of S are 
non-empty subsets of X equipped with well orderings. That is, an element of S is 
a subset A of X together with a well ordering of A. Given two such elements A 


A.4. The Axiom of Choice 385 


and B, we say A < B if AC B and the well ordering on A agrees with the well 
ordering on B (when restricted to elements of A). 


Using the Axiom of Choice, we also choose a choice function which assigns to 
each non-empty subset A of X an element 4(A) € A. 


A chain of elements of S is a chain of subsets of X, each of which has a well 
order that is compatible with the well orders on larger sets in the chain. As a result, 
the union of such a chain of sets is equipped with a well order that is determined by 
the well orders on the sets in the chain. This union is then an upper bound on the 
chain in the order on S. Thus, is a partially ordered set satisfying the hypotheses 
of Zorn’s Lemma. By Zorn’s Lemma, there is a maximal element of S. Such an 
element is a maximal subset Y of X possesing a well order. However, if Y # X, 
then we can adjoin the element a = 4(X \Y) to Y and declare it to be larger than 
every element of Y. Then Y U {x} would be a larger element of $ than Y, but Y¥ 
is a maximal element of S. The resulting contradiction shows that Y must be X 
and, therefore, X has a well order. a 


Consequences of the Axiom of Choice. Assuming the Axiom of Choice leads 
to a large number of useful theorems. We have already seen how it leads to a proof 
that if S is an infinite set, then N = S. It also leads to the following: 


Theorem A.4.5. If S and T are non-empty sets, then either ST orT XS. 


Proof. Using the Well-ordering Theorem, we well order each of S and T. Recall 
that a subset is closed relative to an order relation on aset if, whenever it contains an 
element, it contains all smaller elements. We consider all order-preserving bijections 
from a closed subset of S to a closed subset of T. There is at least one — the one 
that sends the minimal element of $ to the minimal element of T. 


If there is an order-preserving bijection f : A B from a closed subset A of S 
to a closed subset B of T, then it is the unique injection of A into T with an image 
which is closed in T (Exercise A.4.4). Thus, if Aj is a closed subset of $ which 
contains A and if there is an order-preserving bijection g from A, to some closed 
subset of T, then g agrees with f on A. It follows that if S; is the union of all sets 
S for which such a bijection exists, then there is an order-preserving bijection of 
81 onto a closed subset T, of T. If S$; = S, then S$ <T. If T, =T, then T ¥ S. If 
neither of these things is true, then we can extend the bijection $; + T; to larger 
subsets by sending the minimal element of §\ 8; to the minimal element of T\ Ty. 
This is impossible, since S$; — T; is the maximal such bijection. Thus, either $ < T° 
or T <8. a 


Thus, given two sets that are not similar, one of them has a larger cardinal 
than the other. This imposes a linear ordering on the cardinality of sets. It is, in 
fact, a well order, as the next theorem shows. Its proof is left to the exercises. 


Theorem A.4.6. Given any collection of sets, there is one with cardinal less than 
or equal to the cardinal of each of the others. 


The Axiom of Choice has a wealth of other important consequences. In the 
form of Zorn’s Lemma, it can be used to prove that every infinite-dimensional vector 
space has a basis. It is used to prove that the Cartesian product of an infinite 


386 A. Degrees of Infinity 


collection of non-empty sets is itself non-empty and that the Cartesian product of 
an infinite collection of compact topological spaces is compact. In the theory of 
normed linear spaces, it is used to prove the famous Hahn-Banach theorem on the 
extension of bounded linear functionals. 


In general, the Axiom of Choice is a powerful tool in any area of mathematics 
that involves infinite-dimensional spaces. We didn’t use it or refer to it in the main 
body of this text, because our focus was on finite-dimensional spaces. For more on 
this topic see Gleason’s book [3]. 


1. Inacertain town all men are shaved. There is one barber and he shaves exactly 
those men who do not shave themselves. Show that this statement is a paradox 
that is, it involves a logical contradiction. 


Argue that to allow the set of all sets to be a set leads directly to a Russell 
paradox. 


3. Use Zorn’s Lemma to prove that every vector space has a basis (a maximal 
linearly independent subset). 

4. Given well-ordered sets A and T, prove that there is at most one order-preserving 
injection from A to T which has a closed subset of T as image. 

. Prove that Zorn’s Lemma implies the Axiom of Choice. 

. Prove that the Well-ordering Theorem implies the Axiom of Choice. 

. Prove Theorem A.4.6. 


Naga 


Bibliography 


Apostal, T. M. 
Dieudonné, J. 
Gleason, A., Fundamentals of Abstract Analysis, Addison-Wesley, 1966. 
Kelley, J. L., General Topology, Van Nostrand, 1955. 

Lang, S., Analysis I, Addison-Wesley, 1968. 

Loomis, L. H., and Sternberg, ., Advanced Calculus, Addison-Wesley, 1968. 
Murdock, D. C., Linear Algebra for Undergraduates, Wiley, 1957. 


Ross, K. A., Elementary Analysis: The Theory of the Calculus, Springer-Verlag, 
1980. 


Rudin, W., Principles of Mathematical Analysis, 3rd Ed., McGraw-Hill, 1976. 
Spivak, M., Calculus on Manifolds, Benjamin, 1965. 

Taylor, A. E., Advanced Calculus, Ginn and Company, 1955. 

Wade, W. R., An Introduction to Analysis, 3rd Ed., Pearson Prentice Hall, 2004. 
Widder, D. V., Advanced Calculus, 2nd Ed., Prentice Hall, 1961. 


Mathematical Analysis, Addison-Wesley, 1974. 
, Foundations of Modern Analysis, Academic Press, 1960. 


387 


Index 


ren 
vil 


te 


ISS IS Sos pRAMSscscDA 
a 
g 


“2 


ra 


104 
OB, 341, 370 
aE, 176 

of 
(2), 223 
Pree 
Bip 

a 


ay 

24, 380 
3-form, 358 
a®, 122 

Gn -> a, 36 
A < B, 376 


A&B, 376 
A\B.4 

Aw B, 376 
Ax B,7 

BA, 380 
Be,4 

B(x), 174 
B(x), 174 
C(I), 167 

ec, 228 

e(K; 1), 205 
@-cube, 180 
tuple, 161 
€-volume, 276 
Duf, 244 

dx, A de;, 331 
Ex, 304 

E, 176 

e, 122 

E°, 176 

ej, 164 
F:D-+R!, 191 
F x G, 333 
f', 84 

fa, 289 
f:A-+B,5 
fog,5 

S(E), 5 
JMB), 5 
H*, 328 

L(f, P), 102, 277 
Uy), 319 
M+, 222 


389 


390 


Index 


n 
(2). 
N,8 
p-series, 135 


alternating, 141 


Sij, 304 

Tia), 304 
U(f,P), 102, 277 
jal], 164 

||al]., 166 

||z Joa, 166 

Z, 16 

7, 20 

xB, 281 

curl F, 333, 335 
div G, 335 
expa, 122 

grad f, 243, 335 
im(L), 216 

inf, 27 

inf f, 30 
ker(L), 216 
lima, 36 

lim inf, 56 
limsup, 56 
limg-+a f(), 79 
In, 121 

sup, 27 

sup, f, 30 


absolute convergence, 132 
absolute value, 33 
absolutely convergent series 
rearrangement, 143 
addition 
in Q, 17 
in R, 23 
in R¢, 162 
additive 
identity, 16, 162 
inverse, 16, 163 
affine approximation, 230 
best, 230 
affine function, 217 
affine subspace, 217 
aligned partition, 276 
aligned rectangle, 275 
alternating p-series, 141 


alternating series, 140 
alternating series test, 140 


Archimedean ordered field, 25 


Archimedean property, 25 
area of a surface, 353 
aspect ratio, 308 
associative law 


for scalar multiplication, 162 


for vector addition, 162 

of addition, 16 

of multiplication, 16 
Axiom of Choice, 382 

consequences, 385 


lease case, 9 
basis 
of a vector space, 215 


of an infinite-dimensional vector 


space, 385 


best affine approximation, 230 


bijection, 375 
bijective, 375 
binary operation, 16 


binomial formula, 13, 158, 159 
Belzano-Weierstrass Theorem, 52 


in R¢, 171 
boundary, 176 

of a 2-cell, 341 

of a 3-cell, 360 

of a p-cell, 370 

of a p-chain, 370 

of a cube, 359 
boundary map, 367 
bounded 

function, 65 

linear transformation, 212 

sequence, 40, 43 

set, 179 

vector function, 198 
bounded above, 23, 40, 43 
bounded below, 26, 40, 43 
branch of the curve, 245 


cardinal 

of a finite set, 375 
cardinality 

of finite sets, 375 
Cartesian product, 7 
Cauchy principal value, 126 
Cauchy sequence 

in R, 53 

in R*, 172 


Index 391 


in a metric space, 172 connected, 186 
Cauchy’s Mean Value Theorem, 94 function, 192 
Cauchy-Schwarz inequality, 165 of a vector, 162, 164 
cell composition 
2Q-cell, 341 of continuous functions, 63, 193 
2-cell in RY, 354 of functions, 5 
3-cell, 360 conditional convergence, 140 
3-cell in R¢, 364 conditionally convergent series 
simple, 341 rearrangement, 141 
chain conical coordinates, 358 
O-chain, 366 connected 
L-chain, 366 image, 199 
p-chain, 369 set, 184 
in an ordered set, 383 subset of R, 185 
Chain Rule, 86 connected component, 186 
in several variables, 237 continuity 
change of coordinates, 266 at a point, 60, 62, 193 
polar, 264, 329 of a power series sum, 150 
spherical, 265 of composite function, 63, 193 
change of parameter, 323, 343 of vector functions, 191 
change of variables, 241, 328 uniform, 200 
change of variables formula, 311, 313, continuous function, 61 
340, 358 boundedness, 65 
characteristic function extension, 71 
of a set, 281 image, 67 
closed max/min, 65 
ball, 174 when uniformly continuous, 70 
curve, 194 contraction mapping, 93 
relatively, 184 converge 
set, 174 pointwise, 203 
closed differential form, 335, 373 uniformly, 203 
closure, 176 convergence 
coefficients absolute, 132 
of a power series, 148, 154 componentwise, 171 
commutative group, 366 in R¢, 177 
commutative law in a metric space, 169 
for vector addition, 162 of a geometric series, 131 
of addition, 16 of sequences in R, 36 
of multiplication, 16 of sequences in R*, 169 
commutative ring, 16 of series, 129 
of integers, 17 of series with non-negative terms, 132 
compact, 179 pointwise, 74 
image, 198 uniform, 74 
metric space, 182 convex set, 253 
comparison test, 132 countability 
complement of a set, 4 of the rational numbers, 378 
complete countable set, 378 
metric space, 172 countably infinite, 378 
ordered field, 23 Cramer's Rule, 213 
completeness axiom, 23 critical points, 89 


component crossing point, 245, 


392 


Index 


curve 
branch of, 245 
closed, 194 
degenerate, 195 
derivative of, 244, 317 
parameterized, 194 
piecewise smooth, 318 
smooth, 318 
tangent line, 244 
trace of, 318 

cut number, 22 

cycle 
L-cycle, 367 
p-cycle, 370 
equivalence, 372 

cylinder, 196 


decreasing function, 67, 91 
Dedekind cut, 21 
degenerate 

curve, 195 

rectangle, 276 

surface, 195 
dependent variable notation, 240 
derivative 

and monotonicity, 91 

and uniform continuity, 92 

bounded, 92 

definition, 84 

directional, 243 

of a power series, 151 

of a smooth curve, 317 

of an inverse function, 87 

of curve, 244 

theorems, 85 

vanishing, 91 
diameter of a set, 181 
differentiable, 84 

conditions for, 233 

equivalent definition, 236 

for vector functions, 230 
differential 

of a 1-form, 332 

of a 2-form, 334 

of the inner product, 239 
differential form, 317 

1-form, 320 

2-form, 332 

3-form, 334 

p-form, 335 

closed, 335 

exact, 335 


higher-order, 331 

transformation laws, 335 
differential matrix, 231, 232 
direction, 243 
directional derivative, 243 
disjoint sets, 5 
distance 

in R, 33 

in R*, 164 

in a metric space, 168 
distributive law, 16 

for scalar multiplication, 162 
divergent series, 129 
division algorithm, 15 
domain of a function, 5, 59, 191 
dominated by a set, 376 


elementary function, 60 
elementary matrices, 304 
elements of a set, 2 
empty set, 3 
endpoints of a curve, 194 
equivalent 

cycles, 372 

paths, 324 
Euclidean 

inner product, 163 

metric, 169 

space, 161 
exact differential form, 335 
exponent laws, 123 
exponential function, 60, 122 
extended real number system, 27 
exterior differential, 332, 334, 335 
exterior product, 332 


faces 
of a p-cell, 370 
of a cule, 359 
factors, 12 
field, 17 
field of rational numbers, 17 
finite sulscover, 179 
fixed point, 93 
fluid ow, 363 
flux, 354 
form 
1-form, 320 
2-form, 332 
3-form, 334 
p-form, 335 
Fourier coefficients, 119 


Index 


393 


Fourier series, 148 
fraction, 17 
Fubini’s Theorem 
first version, 296 
second version, 297 
third version, 300 
function, 5 
n-homogeneous, 241 
affine, 217 
domain, 5 
graph, 7 
image, 5 
inverse image, 5 
linear, 208 
local inverse, 260 
max/min, 30 
non-differentiable, 234 
one-to-one, 5, 260 
onto, 5 
smooth, 228 
with singularities, 124 
Fundamental Theorem of Calculus 
first form, 115 
for curves, 322 
second form, 116 


Gauss elimination, 304 
Gauss’s Theorem 

classical form, 363 

for a 3-cell, 361 

on a cube, 360 
geometric series, 131 

convergence, 131 
gradient, 243 
graph of a function, 7 
gravitational force field, 365 
greatest lower bound, 26 
Green’s Theorem 

classical version, 346 

on a cell, 344 

on a rectangle, 338, 339 
group homomorphism, 367 


harmonic series, 130, 136 
Heine-Borel Theorem, 180 
homogeneous function, 241 
homomorphism, 367 
homotopic paths, 347 


identity 
additive, 16 
multiplicative, 16 


image 
of a linear transformation, 216 
under a function, 5 
image under a function, 5 
Implicit Function Theorem, 269 
improper integral, 123 
increasing function, 67, 91 
indeterminate forms, 93 
index of summation, 131 
induction 
axiom, 8 
for definitions, 9 
for propositions, 9 
step, 9 
uses, 12 
inductive definition 
of addition, 10 
of product, 14 
inductive definitions, 9 
inductive proof 
of associative law, 11 
of the commutative law, 11 
infimum, 27 
infinite limits, 83 
of sequences, 48 
infinite series, 129 
infinite-dimensional vector space 
basis, 385, 386 
injection, 375 
injective, 375 
inner product, 163 
space, 163, 169 
inner volume, 283 
integers, 17 
integrable, 104, 279, 288 
over a Jordan region, 290 
integral, 104, 279 
definition, 101 
existence, 108 
improper, 123 
interval additivity, 112 
linearity, 109 
Mean Value Theorem, 112 
multi-iterated, 300 
of a 2-form, 339 
of a 3-form, 358 
of a power series, 150 
of a series, 147 
of a uniformly convergent sequence, 
114, 291 
order preserving, 111 


394 


Index 


over [°, 359 
over a 2-cell, 343 
over a p-cell, 371 
over a p-chain, 371 
over a Cartesian product, 297 
over a Jordan region, 290 
over a parameterized 3-surface, 363 
over a parameterized surface, 348 
over a path, 321 
over the boundary of a 3-cell, 360 
upper and lower, 104 
with respect to arc length, 327 
with respect to surface area, 353 
integral test, 135 
integration by parts, 118 
interchange matrix, 304 
interior, 176 
Intermediate Value Theorem, 66 
intersection, 3 
interval additivity 
of the integral, 112, 295 
interval of convergence, 149 
inverse 
additive, 16 
mutlplicative, 17 
of a matrix, 213 
inverse function, 68, 87, 263 
local, 260 
Inverse Function Theorem, 264 
inverse image under a function, 5 
inverse trigonometric function, 60 
irrational numbers, 23 
cardinality, 381 
isolated point, 189, 194 


Jordan region, 283 
characterization, 286 


kernel of a linear transformation, 216 


L’Hépital’s Rule, 95 
Lagrange multipliers, 257 
Lagrange’s remainder, 157 
laws of exponents, 123 
least upper bound, 23 
level set, 247 


limit 
at +00, 80 
in R*, 169 


infinite, 83 
of a composite function, 82 
of a function, 79 


of a monotone function, 84 
of a sequence in R, 36 
of a vector function, 194 
one sided, 80 
limit point, 193 
limits and continuity, 79 
linear 
equations, 219 
operator, 208 
transformation, 208 
linear function, 208 
image, 216 
kernel, 216 
matrix, 209 
operations, 210 
linear subspace, 215 
linear transformation, 280 
linearity of the integral, 109 
linearly independent, 215 
linearly ordered set, 383 
lines in R?, 218 
local 
max/min, 254 
log function, 60 
logarithm, 121 
general, 123 
lower 
bound, 26 
integral, 104, 278 
sum, 102, 277 


Main Limit Theorem 

for functions, 82 

for sequences, 44 

for vectors, 171 
mathematical induction, 8 
matrix, 209 

columns, 209 

elementary, 304 

interchange, 304 

inverse, 213 

multiplication, 211 

non-singular, 213 

of a linear function, 209 

operations, 210 

positive definite, 254 

rows, 209 

scale, 304 

shear, 304 

submatrix, 216 
max/min 

for a continuous function, 65 


Index 


in two variables, 256 
local, 254 
of a function, 30 
Mean Value Theorem, 90 
for the integral, 112 
in several variables, 253 
metric space, 161, 168, 169 
complete, 172 
Mobius bane, 352 
monotone 
function, 67 
sequence, 46 
Monotone Convergence Theorem, 46 
monotonicity and the derivative, 91 
multiplication 
by scalars, 162 
in Q, 17 
in R, 23 
multiplicative 
identity, 16 
inverse, 17 


natural domain, 59 
natural logarithm, 121 
natural numbers, 8 
negatively oriented, 349 

2Q-cell, 341 
neighborhood, 174 
nested sequence 

of intervals, 51 

of sets, 180 
non-decreasing 

function, 91 

sequence, 46 
non-differentialele function, 234 
non-increasing 

function, 91 

sequence, 46 
non-singular matrix, 213 
norm 

Euclidean, 164 

general, 166 

sup, 205 
normal vector, 352 
normed vector space, 166, 169 


one-to-one, 5, 260 
onto, 5 
open 
ball, 174 
cover, 179 
map, 262 


relatively, 184, 197 
set, 174 
Open Mapping Theorem, 262 
operations 
on linear functions, 210 
on matrices, 210 
operator norm, 212 
order 
linear, 383 
partial, 383 
total, 383 
well, 383 
order relation 
in N, 12 
on Q, 18 
on R, 23 
ordered basis, 341 
ordered field, 18 
of rational numbers, 18 
orientable surface, 352, 363 
orientation, 350 
negative, 351 
of a smooth surface, 352 
positive, 351 
preserving, 323 
reversing, 323 
origin, 162 
orthogonal vectors, 165, 
outer volume, 283 


pairwise disjoint, 5 
parallelogram law, 167 
parameter 

change, 323, 343, 349 

independence, 324, 325, 349, 353 

interval, 194, 317 
parameterization 

smooth, 246 
parameterized 

by arc length, 327 

curve, 194 

surface, 195, 246, 348 
parametric equations, 218 
partial derivative, 223 

equality of mixed partials, 226 

of order 2, 224 

total degree, 225 
partial sum of a series, 129, 206 
partially ordered set, 383 
partition 

of a rectangle, 276 

of an interval, 101 


396 Index 


refinement, 103 relatively 
path, 318 closed, 184 
equivalence, 324 open, 184, 197 
piecewise linear, 186 relatively prime, 20 
simple, 322 remainder 
simple closed, 322 Lagrange form, 157 
Peano’s axioms, 8 Taylor's formula, 155 
piecewise linear path, 186 Riemann integral, 104, 279 
piecewise smooth curve, 318 Riemann sum, 101, 277 
planes in R?, 219 root test, 136 
pointwise convergence, 74, 203 row reduction, 304 
polar change of coordinates, 264 Russell’s paradox, 382 
polynomial, 59 
positive definite matrix, 254 saddle point, 256 
2x 2 case, 256 sawtooth function, 60 
positively oriented, 349 scalar, 162 
cell, 341 scalar multiplication, 162 
3-cell, 360 scale matrices, 304 
power function, 60 Schréder-Bernstein Theorem, 377 
power series, 148 separated, 184 
coefficients, 148, 154 separation theorem, 182 
continuity of sum, 150 sequence, 8 
derivative, 151 convergent, 36 
integration, 150 in R*, 169 
interval of convergence, 149 of functions, 73, 202 
radius of convergence, 149 of partial sums, 130 
Taylor series, 154 of real numbers, 34 
prime numbers, 12 of statements, 8 
product of series, 144 pointwise convergent, 74 
Sropersubsel, 3 uniformly Cauchy, 77, 204 
properties of N, 10 uniformly convergent, 74 
series, 129 
radius of convergence, 149 p-series, 135 
range of a function, 5 absolute convergence, 132 
rank alternating, 140 
of a linear transformation, 216 alternating series test, 140 
of a matrix, 216 comparison test, 132 
ratio for a geometric series, 131 conditional convergence, 140 
ratio test, 137 convergence, 129 
rational function, 60 divergent, 129 
rational number field, 18 Fourier, 148 
defects of, 19 geometric, 131 
rational number system, 17 harmonic, 130, 136 
real numbers, 21 integral test, 135 
rearrangement of a series, 141, 143 integration of, 147 
rectangle of functions, 146 
aligned, 275 partial sum, 129, 206 
recursion relation, 9 power, 148 
reduced form, 366 product, 144 
of a O-chain, 367 ratio test, 137 
of a p-chain, 370 rearrangement, 141 


refinement of a partition, 103, 278 root test, 136 


Index 


terms, 129 
uniform convergence, 147, 206 
Weierstrass M-test, 147 
with non-negative terms, 131 

set, 2 
convex, 253 
of volume zero, 285 

sets 
equality of, 2 

shear matrices, 304 

similar sets, 376 

simple 
2-cell, 341 
3-cell, 360 
closed path, 322 
path, 322 

singleton subset, 366 

singular 
matrix, 213 
point, 89 

singularities, 124 

smooth 
p-surface, 270, 351 
curve, 318 
function, 228 
function on a square, 341 
local inverse, 260 
parameter change, 323, 349, 358 
parameterization, 246, 270 

smoothly parameterized surface, 246, 

348, 363 

sphere, 195 

spherical change of coordinates, 265 

squeeze principle, 43 

stationary point, 89 

Stokes’s Theorem, 354 
classical form, 356 
for 3-cells in R%, 364 
general form, 373 

submatrix, 216 

subrectangle, 276 

subsequence, 51 

sulsequential limit, 56 

subset, 2 
proper, 2 

sulespace 
affine, 217 
linear, 215 

substitution, 118 

subtraction in a commutative ring, 17 

successor, 8 


sup norm, 205 
supremum, 27 
surface 
p-surface, 317 
area, 353 
degenerate, 195 
integral, 354 
orientation, 352 
parameterized, 195, 246 
surjection, 375 
surjective, 375 


tangent 

line, 244 

space, 246 

vector, 245, 322, 327 
Taylor 

formula, 155, 252 

polynomial, 154 

remainder, 155 

series, 154 
term test, 130 
topological 

properties, 179 

space, 175 
topology for R*, 175 
totally ordered set, 383 
trace 

of a 2Q-cell, 341 

of a p-chain, 370 

of a chain, 366 

of a curve, 318 
transformation, 191 

composition law, 337 
transformation law 

for differential forms, 335 
transitive property, 14 
triangle inequality 

for metric spaces, 168 

for vectors, 165 

in R, 34 
trigonometric function, 60 


uncountable sets, 380 
uniform continuity, 69, 200 
and the derivative, 92 
uniform convergence, 74, 203 
and continuity, 75, 204 
and the integral, 114, 291 
of series, 147 
tests for, 76 
uniformly Cauchy sequence, 77, 204 


398 


Index 


union, 3 
unit 
normal vector, 352 
tangent vector, 327 
vector, 243 
universal set, 4 
upper 
bound, 23 
integral, 104, 278 
sum, 102, 277 


vanishing derivative, 91 
variable 
change of, 241 
dependent, 240 
vector, 162 
addition, 162 
components, 162 
normal, 352 
tangent, 245 
vector space, 162, 280 
basis, 215 
finite-dimensional, 215 
infinite-dimensional, 385 
linear subspace, 215 
vector-matrix 
notation, 209 
product, 209 
velocity field, 363 
volume, 283 
inner, 283 
of a rectangle, 276 
outer, 283 
zero, 285 


wedge product, 332 
Weierstrass M-test, 147, 206 
well ordering 

for the natural numbers, 379 
well-ordered set, 383 
well-ordering principle, 15 
Well-ordering Theorem, 384 


Zermelo-Fraenkel axioms, 383 
zero vector, 162 
Zorn’s Lemma, 383 


Analysis plays a crucial role in the undergraduate curriculum. 
Building upon the familiar notions of calculus, analysis intro- 
duces the depth and rigor characteristic of higher mathematics 
courses. Foundations of Analysis has two main goals. The first 
is to develop in students the mathematical maturity and sophis- 
tication they will need as they move through the upper division 
curriculum. The second is to present a rigorous development of rt 


both single and several variable calculus, beginning with a study 
of the properties of the real number system. 


The presentation is both thorough and concise, with simple, straightforward 
explanations. The exercises differ widely in level of abstraction and level of diffi- 
culty. They vary from the simple to the quite difficult and from the computational 
to the theoretical. Each section contains a number of examples designed to 
illustrate the material in the section and to teach students how to approach the 
exercises for that section 


The list of topics covered is rather standard, although the treatment of some of 
them is not. The several variable material makes full use of the power of linear 
algebra, particularly in the treatment of the differential of a function as the best 
affine approximation to the function at a given point. The text includes a review of 
several linear algebra topics in preparation for this material. In the final chapter, 
vector calculus is presented from a modern point of view, using differential forms 
to give a unified treatment of the major theorems relating derivatives and integrals: 
and Stol<es's Theorems. 


Green's, Gaus: 


At appropriate points, abstract metric spaces, topological spaces, inner product 
spaces, and normed linear spaces are introduced, but only as asides. That is, the 
course is grounded in the concrete world of Euclidean space, but the students 
are made aware that there are more exotic worlds in which the concepts they are 
learning may be studied. 


www.ams.org/bookpages/amstext-18 


ISBN 978-0-8218-8984-8 a For additional information 
| | and updates on this book, visit 


AMS on the Web 


91780821 1889848 
AMSTEXT/18 


This series was founded by the highly respected 
mathematician and educator, Paul J. Sally, Jr 


